This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn
ClosedPublic

Authored by rampitec on Feb 12 2021, 3:49 PM.

Download Raw Diff

Details

Reviewers

arsenm
foad
t-tye

Commits

rG5cf9292ce341: [AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn

Summary

We are using AtomicNoRet map in multiple places to determine
if an instruction atomic, rtn or nortn atomic. This method
does not work always since we have some instructions which
only has rtn or nortn version.

One such instruction is ds_wrxchg_rtn_b32 which does not have
nortn version. This has caused changes in memory legalizer
tests.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rampitec created this revision.Feb 12 2021, 3:49 PM

Herald added subscribers: kerbowa, jfb, hiraditya and 6 others. · View Herald TranscriptFeb 12 2021, 3:49 PM

rampitec requested review of this revision.Feb 12 2021, 3:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2021, 3:49 PM

Herald added a subscriber: wdng. · View Herald Transcript

@t-tye memory legalizer changes are due to the fact that SIMemoryLegalizer::isAtomicRet() was returning false for the ds_wrxchg_rtn_b32.

Please verify if the tests now look correctly or not. I am checking memory model for GFX10 and I do not see vmcnt required for these situations, but I do not see vscnt which was generated before there as well. Looks like this has uncovered a bug in either legalizer (likely) or memory model description. On top of that I do not see memory model describing atomicrmw acquire agent local combination.

In D96639#2561212, @rampitec wrote:

@t-tye memory legalizer changes are due to the fact that SIMemoryLegalizer::isAtomicRet() was returning false for the ds_wrxchg_rtn_b32.

Please verify if the tests now look correctly or not. I am checking memory model for GFX10 and I do not see vmcnt required for these situations, but I do not see vscnt which was generated before there as well. Looks like this has uncovered a bug in either legalizer (likely) or memory model description. On top of that I do not see memory model describing atomicrmw acquire agent local combination.

I.e. I understand why vscnt is replaced with vmcnt, it was treated as store and now as load. This is fine. I don't understand what does it do with vmem at all.

rampitec added inline comments.Feb 12 2021, 5:05 PM

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
572	@foad do I get it right, all MIMG atomics are rtn only? MIMG has never used AtomicNoRet map, so this branch was dead. Maybe we were missing to wait for exp, maybe they all mayStore, or maybe it is really dead an unused. Potentially you will see some extra exp waits, but it didn't appear in our tests. That said I am afraid test coverage for waitcounts is not sufficient.

In D96639#2561212, @rampitec wrote:

On top of that I do not see memory model describing atomicrmw acquire agent local combination.

It is not present because the local address space is per workgroup and so only supports up to workgroup scope. Using any larger scope "decays" to workgroup scope.

In D96639#2561302, @rampitec wrote:

In D96639#2561212, @rampitec wrote:

@t-tye memory legalizer changes are due to the fact that SIMemoryLegalizer::isAtomicRet() was returning false for the ds_wrxchg_rtn_b32.

Please verify if the tests now look correctly or not. I am checking memory model for GFX10 and I do not see vmcnt required for these situations, but I do not see vscnt which was generated before there as well. Looks like this has uncovered a bug in either legalizer (likely) or memory model description. On top of that I do not see memory model describing atomicrmw acquire agent local combination.

I.e. I understand why vscnt is replaced with vmcnt, it was treated as store and now as load. This is fine. I don't understand what does it do with vmem at all.

The test result changes look correct. The legalizer generated code was not incorrect before, it was just not optimal. The generation of the vmcnt or vscnt is not required at all as the ds_wrxchg_rtn instruction only acts on the local memory which is only accessibly up to workgroup scope. Accessing at a wider scope "decays" to workgroup scope. It would be good to consider updating the memory legalizer to exploit this fact.

This revision is now accepted and ready to land.Feb 12 2021, 8:21 PM

In D96639#2561398, @t-tye wrote:

In D96639#2561212, @rampitec wrote:

On top of that I do not see memory model describing atomicrmw acquire agent local combination.

It is not present because the local address space is per workgroup and so only supports up to workgroup scope. Using any larger scope "decays" to workgroup scope.

That's my understanding, but probably deserves some explanation of scope nesting in the documentation. Maybe references in the memory model to scpoes shall refer to ">= a scope"? I am just trying to play a game, an unsophisticated user came to read it. I understand why is it not written, why does a wider scope decays a narrower, but coming after a long day of bit cracking into this table stunned me for a moment. Just a personal experience, I didn't want to think, I just wanted an answer which was not readily available. I had to think where I probably shouldn't. In many cases thinking leads to unhealthy ideas where it comes to memory model. After all we test it for some reason yet don't document it.

In D96639#2561629, @rampitec wrote:

In D96639#2561398, @t-tye wrote:

In D96639#2561212, @rampitec wrote:

On top of that I do not see memory model describing atomicrmw acquire agent local combination.

It is not present because the local address space is per workgroup and so only supports up to workgroup scope. Using any larger scope "decays" to workgroup scope.

That's my understanding, but probably deserves some explanation of scope nesting in the documentation. Maybe references in the memory model to scpoes shall refer to ">= a scope"? I am just trying to play a game, an unsophisticated user came to read it. I understand why is it not written, why does a wider scope decays a narrower, but coming after a long day of bit cracking into this table stunned me for a moment. Just a personal experience, I didn't want to think, I just wanted an answer which was not readily available. I had to think where I probably shouldn't. In many cases thinking leads to unhealthy ideas where it comes to memory model. After all we test it for some reason yet don't document it.

The AMDGPU memory scopes are defined in https://llvm.org/docs/AMDGPUUsage.html#amdgpu-memory-scopes which is referenced at the beginning of the memory model section. Reading that I think addresses all the questions. That section also references the HSA and OpenCL specifications that more formally define the memory model supported by AMDGPU. I agree that thinking about memory models can be detremental to your mental health and should be avoided when working on code on Friday evening's when possible:-)

Rebased. Unfortunately D96643 did not remove waits on vmem for GFX10 LDS accesses.

Looks good. Can you remove the getAtomicRetOp table if it's not used for anything?

llvm/lib/Target/AMDGPU/SIDefines.h
111	This one lacks a comment. Can you spell it "IsAtomicNoRet"? I don't think we use the "rtn"/"nortn" spelling anywhere else.
114	"IsAtomicRet"?

foad added inline comments.Feb 15 2021, 7:15 AM

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
572	That's not true, architecturally all image atomics have ret/noret versions controlled by the glc bit just like buffer instructions. But I guess in LLVM we have never bothered to select the noret instructions.

foad added inline comments.Feb 15 2021, 7:38 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.h
557	I am surprised that there isn't already a MachineInstr::IsAtomic method.

foad added inline comments.Feb 15 2021, 8:12 AM

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
572	Incidentally I don't understand this code at all. Surely all atomics are both mayLoad and mayStore, so this will have taken the "if" path on line 570?

t-tye added inline comments.Feb 15 2021, 8:15 AM

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
572	The hardware can be configured as to whether IMG instructions behave as non-IMG instruction wit respect to the waitcnts and the ordering. For HSA the hardware is configured to make them behave the same.

foad added inline comments.Feb 15 2021, 8:43 AM

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
572	Incidentally I don't understand this code at all. Surely all atomics are both mayLoad and mayStore, so this will have taken the "if" path on line 570? Just to prove my point, this still passes the test suite: https://reviews.llvm.org/differential/diff/323776/

rampitec added inline comments.Feb 15 2021, 8:50 AM

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
572	I am afraid we do not have tests covering every path in this pass.

foad added inline comments.Feb 15 2021, 8:53 AM

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
572	I am saying this is dead code because every atomic is mayLoad and mayStore. Do you disagree?

rampitec added inline comments.Feb 15 2021, 8:54 AM

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
572	Theoretically an atomic load can eecape that.
llvm/lib/Target/AMDGPU/SIInstrInfo.h
557	Me either.

rampitec added inline comments.Feb 15 2021, 9:00 AM

llvm/lib/Target/AMDGPU/SIDefines.h
111	Actually we are using "rtn" in the td files. I was looking what do we use more, "ret" or "rtn" and got the impression "rtn" is used more often. I do not have a strong preference though.

Addressed Jay's comments.

LGTM, thanks.

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
115	I think this is dead too since it is covered by the mayStore just above.

Closed by commit rG5cf9292ce341: [AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn (authored by rampitec). · Explain WhyFeb 15 2021, 11:28 AM

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rG5cf9292ce341: [AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn.

rampitec added inline comments.Feb 15 2021, 11:31 AM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
115	There are atomic loads. Although these are not marked as atomics, which may be a problem. As far as I understand this is just a load with glc.

t-tye added inline comments.Feb 15 2021, 2:32 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
115	There are atomic loads. Although these are not marked as atomics, which may be a problem. As far as I understand this is just a load with glc. Not all atomics are read-modify-write. There are atomics loads, atmic stores, and rmw atomics that do not consume the result.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

2 lines

4 lines

3 lines

29 lines

8 lines

SIFormMemoryClauses.cpp

3 lines

16 lines

10 lines

29 lines

9 lines

SIMemoryLegalizer.cpp

2 lines

test/

CodeGen/

AMDGPU/

memory-legalizer-local-agent.ll

18 lines

memory-legalizer-local-system.ll

18 lines

memory-legalizer-local-workgroup.ll

18 lines

Diff 323811

llvm/lib/Target/AMDGPU/BUFInstructions.td

Show First 20 Lines • Show All 690 Lines • ▼ Show 20 Lines	: MUBUF_Atomic_Pseudo<opName, addrKindCopy,
(outs),		(outs),
getMUBUFAtomicIns<addrKindCopy, vdataClassCopy, 0>.ret,		getMUBUFAtomicIns<addrKindCopy, vdataClassCopy, 0>.ret,
" $vdata, " # getMUBUFAsmOps<addrKindCopy>.ret # "$slc",		" $vdata, " # getMUBUFAsmOps<addrKindCopy>.ret # "$slc",
pattern>,		pattern>,
AtomicNoRet<opName # "_" # getAddrName<addrKindCopy>.ret, 0> {		AtomicNoRet<opName # "_" # getAddrName<addrKindCopy>.ret, 0> {
let PseudoInstr = opName # "_" # getAddrName<addrKindCopy>.ret;		let PseudoInstr = opName # "_" # getAddrName<addrKindCopy>.ret;
let glc_value = 0;		let glc_value = 0;
let dlc_value = 0;		let dlc_value = 0;
		let IsAtomicNoRet = 1;
let AsmMatchConverter = "cvtMubufAtomic";		let AsmMatchConverter = "cvtMubufAtomic";
}		}

class MUBUF_AtomicRet_Pseudo<string opName, int addrKind,		class MUBUF_AtomicRet_Pseudo<string opName, int addrKind,
RegisterClass vdataClass,		RegisterClass vdataClass,
list<dag> pattern=[],		list<dag> pattern=[],
// Workaround bug bz30254		// Workaround bug bz30254
int addrKindCopy = addrKind,		int addrKindCopy = addrKind,
RegisterClass vdataClassCopy = vdataClass>		RegisterClass vdataClassCopy = vdataClass>
: MUBUF_Atomic_Pseudo<opName, addrKindCopy,		: MUBUF_Atomic_Pseudo<opName, addrKindCopy,
(outs vdataClassCopy:$vdata),		(outs vdataClassCopy:$vdata),
getMUBUFAtomicIns<addrKindCopy, vdataClassCopy, 1>.ret,		getMUBUFAtomicIns<addrKindCopy, vdataClassCopy, 1>.ret,
" $vdata, " # getMUBUFAsmOps<addrKindCopy>.ret # "$glc1$slc",		" $vdata, " # getMUBUFAsmOps<addrKindCopy>.ret # "$glc1$slc",
pattern>,		pattern>,
AtomicNoRet<opName # "_" # getAddrName<addrKindCopy>.ret, 1> {		AtomicNoRet<opName # "_" # getAddrName<addrKindCopy>.ret, 1> {
let PseudoInstr = opName # "_rtn_" # getAddrName<addrKindCopy>.ret;		let PseudoInstr = opName # "_rtn_" # getAddrName<addrKindCopy>.ret;
let glc_value = 1;		let glc_value = 1;
let dlc_value = 0;		let dlc_value = 0;
		let IsAtomicRet = 1;
let Constraints = "$vdata = $vdata_in";		let Constraints = "$vdata = $vdata_in";
let DisableEncoding = "$vdata_in";		let DisableEncoding = "$vdata_in";
let AsmMatchConverter = "cvtMubufAtomicReturn";		let AsmMatchConverter = "cvtMubufAtomicReturn";
}		}

multiclass MUBUF_Pseudo_Atomics_NO_RTN <string opName,		multiclass MUBUF_Pseudo_Atomics_NO_RTN <string opName,
RegisterClass vdataClass,		RegisterClass vdataClass,
ValueType vdataType,		ValueType vdataType,
▲ Show 20 Lines • Show All 1,790 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/DSInstructions.td

	Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	class DS_1A1D_NORET<string opName, RegisterClass rc = VGPR_32>			class DS_1A1D_NORET<string opName, RegisterClass rc = VGPR_32>
	: DS_Pseudo<opName,			: DS_Pseudo<opName,
	(outs),			(outs),
	(ins VGPR_32:$addr, rc:$data0, offset:$offset, gds:$gds),			(ins VGPR_32:$addr, rc:$data0, offset:$offset, gds:$gds),
	" $addr, $data0$offset$gds"> {			" $addr, $data0$offset$gds"> {

	let has_data1 = 0;			let has_data1 = 0;
	let has_vdst = 0;			let has_vdst = 0;
				let IsAtomicNoRet = 1;
	}			}

	multiclass DS_1A1D_NORET_mc<string opName, RegisterClass rc = VGPR_32> {			multiclass DS_1A1D_NORET_mc<string opName, RegisterClass rc = VGPR_32> {
	def "" : DS_1A1D_NORET<opName, rc>,			def "" : DS_1A1D_NORET<opName, rc>,
	AtomicNoRet<opName, 0>;			AtomicNoRet<opName, 0>;

	let has_m0_read = 0 in {			let has_m0_read = 0 in {
	def _gfx9 : DS_1A1D_NORET<opName, rc>,			def _gfx9 : DS_1A1D_NORET<opName, rc>,
	AtomicNoRet<opName#"_gfx9", 0>;			AtomicNoRet<opName#"_gfx9", 0>;
	}			}
	}			}

	class DS_1A2D_NORET<string opName, RegisterClass rc = VGPR_32>			class DS_1A2D_NORET<string opName, RegisterClass rc = VGPR_32>
	: DS_Pseudo<opName,			: DS_Pseudo<opName,
	(outs),			(outs),
	(ins VGPR_32:$addr, rc:$data0, rc:$data1, offset:$offset, gds:$gds),			(ins VGPR_32:$addr, rc:$data0, rc:$data1, offset:$offset, gds:$gds),
	" $addr, $data0, $data1$offset$gds"> {			" $addr, $data0, $data1$offset$gds"> {

	let has_vdst = 0;			let has_vdst = 0;
				let IsAtomicNoRet = 1;
	}			}

	multiclass DS_1A2D_NORET_mc<string opName, RegisterClass rc = VGPR_32> {			multiclass DS_1A2D_NORET_mc<string opName, RegisterClass rc = VGPR_32> {
	def "" : DS_1A2D_NORET<opName, rc>,			def "" : DS_1A2D_NORET<opName, rc>,
	AtomicNoRet<opName, 0>;			AtomicNoRet<opName, 0>;

	let has_m0_read = 0 in {			let has_m0_read = 0 in {
	def _gfx9 : DS_1A2D_NORET<opName, rc>,			def _gfx9 : DS_1A2D_NORET<opName, rc>,
	Show All 24 Lines
	class DS_1A1D_RET <string opName, RegisterClass rc = VGPR_32>			class DS_1A1D_RET <string opName, RegisterClass rc = VGPR_32>
	: DS_Pseudo<opName,			: DS_Pseudo<opName,
	(outs rc:$vdst),			(outs rc:$vdst),
	(ins VGPR_32:$addr, rc:$data0, offset:$offset, gds:$gds),			(ins VGPR_32:$addr, rc:$data0, offset:$offset, gds:$gds),
	" $vdst, $addr, $data0$offset$gds"> {			" $vdst, $addr, $data0$offset$gds"> {

	let hasPostISelHook = 1;			let hasPostISelHook = 1;
	let has_data1 = 0;			let has_data1 = 0;
				let IsAtomicRet = 1;
	}			}

	multiclass DS_1A1D_RET_mc <string opName, RegisterClass rc = VGPR_32,			multiclass DS_1A1D_RET_mc <string opName, RegisterClass rc = VGPR_32,
	string NoRetOp = ""> {			string NoRetOp = ""> {
	def "" : DS_1A1D_RET<opName, rc>,			def "" : DS_1A1D_RET<opName, rc>,
	AtomicNoRet<NoRetOp, !ne(NoRetOp, "")>;			AtomicNoRet<NoRetOp, !ne(NoRetOp, "")>;

	let has_m0_read = 0 in {			let has_m0_read = 0 in {
	def _gfx9 : DS_1A1D_RET<opName, rc>,			def _gfx9 : DS_1A1D_RET<opName, rc>,
	AtomicNoRet<!if(!eq(NoRetOp, ""), "", NoRetOp#"_gfx9"),			AtomicNoRet<!if(!eq(NoRetOp, ""), "", NoRetOp#"_gfx9"),
	!ne(NoRetOp, "")>;			!ne(NoRetOp, "")>;
	}			}
	}			}

	class DS_1A2D_RET<string opName,			class DS_1A2D_RET<string opName,
	RegisterClass rc = VGPR_32,			RegisterClass rc = VGPR_32,
	RegisterClass src = rc>			RegisterClass src = rc>
	: DS_Pseudo<opName,			: DS_Pseudo<opName,
	(outs rc:$vdst),			(outs rc:$vdst),
	(ins VGPR_32:$addr, src:$data0, src:$data1, offset:$offset, gds:$gds),			(ins VGPR_32:$addr, src:$data0, src:$data1, offset:$offset, gds:$gds),
	" $vdst, $addr, $data0, $data1$offset$gds"> {			" $vdst, $addr, $data0, $data1$offset$gds"> {

	let hasPostISelHook = 1;			let hasPostISelHook = 1;
				let IsAtomicRet = 1;
	}			}

	multiclass DS_1A2D_RET_mc<string opName,			multiclass DS_1A2D_RET_mc<string opName,
	RegisterClass rc = VGPR_32,			RegisterClass rc = VGPR_32,
	string NoRetOp = "",			string NoRetOp = "",
	RegisterClass src = rc> {			RegisterClass src = rc> {
	def "" : DS_1A2D_RET<opName, rc, src>,			def "" : DS_1A2D_RET<opName, rc, src>,
	AtomicNoRet<NoRetOp, !ne(NoRetOp, "")>;			AtomicNoRet<NoRetOp, !ne(NoRetOp, "")>;
	▲ Show 20 Lines • Show All 1,152 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/FLATInstructions.td

Show First 20 Lines • Show All 339 Lines • ▼ Show 20 Lines	FLAT_Pseudo<opName, outs, ins, asm, pattern> {
let mayLoad = 1;		let mayLoad = 1;
let mayStore = 1;		let mayStore = 1;
let has_glc = 0;		let has_glc = 0;
let glcValue = 0;		let glcValue = 0;
let has_dlc = 0;		let has_dlc = 0;
let dlcValue = 0;		let dlcValue = 0;
let has_vdst = 0;		let has_vdst = 0;
let maybeAtomic = 1;		let maybeAtomic = 1;
		let IsAtomicNoRet = 1;
}		}

class FLAT_AtomicRet_Pseudo<string opName, dag outs, dag ins,		class FLAT_AtomicRet_Pseudo<string opName, dag outs, dag ins,
string asm, list<dag> pattern = []>		string asm, list<dag> pattern = []>
: FLAT_AtomicNoRet_Pseudo<opName, outs, ins, asm, pattern> {		: FLAT_AtomicNoRet_Pseudo<opName, outs, ins, asm, pattern> {
let hasPostISelHook = 1;		let hasPostISelHook = 1;
let has_vdst = 1;		let has_vdst = 1;
let glcValue = 1;		let glcValue = 1;
let dlcValue = 0;		let dlcValue = 0;
		let IsAtomicNoRet = 0;
		let IsAtomicRet = 1;
let PseudoInstr = NAME # "_RTN";		let PseudoInstr = NAME # "_RTN";
}		}

multiclass FLAT_Atomic_Pseudo<		multiclass FLAT_Atomic_Pseudo<
string opName,		string opName,
RegisterClass vdst_rc,		RegisterClass vdst_rc,
ValueType vt,		ValueType vt,
SDPatternOperator atomic = null_frag,		SDPatternOperator atomic = null_frag,
▲ Show 20 Lines • Show All 1,340 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/MIMGInstructions.td

Show All 33 Lines	class MIMGBaseOpcode : PredicateControl {
bit Sampler = 0;		bit Sampler = 0;
bit Gather4 = 0;		bit Gather4 = 0;
bits<8> NumExtraArgs = 0;		bits<8> NumExtraArgs = 0;
bit Gradients = 0;		bit Gradients = 0;
bit G16 = 0;		bit G16 = 0;
bit Coordinates = 1;		bit Coordinates = 1;
bit LodOrClampOrMip = 0;		bit LodOrClampOrMip = 0;
bit HasD16 = 0;		bit HasD16 = 0;
		bit IsAtomicRet = 0;
}		}

def MIMGBaseOpcode : GenericEnum {		def MIMGBaseOpcode : GenericEnum {
let FilterClass = "MIMGBaseOpcode";		let FilterClass = "MIMGBaseOpcode";
}		}

def MIMGBaseOpcodesTable : GenericTable {		def MIMGBaseOpcodesTable : GenericTable {
let FilterClass = "MIMGBaseOpcode";		let FilterClass = "MIMGBaseOpcode";
▲ Show 20 Lines • Show All 501 Lines • ▼ Show 20 Lines	let VAddrDwords = 4 in {
def _V4_gfx10 : MIMG_Atomic_gfx10 <op, asm, data_rc, VReg_128, 0>;		def _V4_gfx10 : MIMG_Atomic_gfx10 <op, asm, data_rc, VReg_128, 0>;
def _V4_nsa_gfx10 : MIMG_Atomic_nsa_gfx10 <op, asm, data_rc, 4, enableDasm>;		def _V4_nsa_gfx10 : MIMG_Atomic_nsa_gfx10 <op, asm, data_rc, 4, enableDasm>;
}		}
}		}
}		}
}		}

multiclass MIMG_Atomic <mimgopc op, string asm, bit isCmpSwap = 0, bit isFP = 0> { // 64-bit atomics		multiclass MIMG_Atomic <mimgopc op, string asm, bit isCmpSwap = 0, bit isFP = 0> { // 64-bit atomics
		let IsAtomicRet = 1 in {
def "" : MIMGBaseOpcode {		def "" : MIMGBaseOpcode {
let Atomic = 1;		let Atomic = 1;
let AtomicX2 = isCmpSwap;		let AtomicX2 = isCmpSwap;
}		}

let BaseOpcode = !cast<MIMGBaseOpcode>(NAME) in {		let BaseOpcode = !cast<MIMGBaseOpcode>(NAME) in {
// _V* variants have different dst size, but the size is encoded implicitly,		// _V* variants have different dst size, but the size is encoded implicitly,
// using dmask and tfe. Only 32-bit variant is registered with disassembler.		// using dmask and tfe. Only 32-bit variant is registered with disassembler.
// Other variants are reconstructed by disassembler using dmask and tfe.		// Other variants are reconstructed by disassembler using dmask and tfe.
let VDataDwords = !if(isCmpSwap, 2, 1) in		let VDataDwords = !if(isCmpSwap, 2, 1) in
defm _V1 : MIMG_Atomic_Addr_Helper_m <op, asm, !if(isCmpSwap, VReg_64, VGPR_32), 1, isFP>;		defm _V1 : MIMG_Atomic_Addr_Helper_m <op, asm, !if(isCmpSwap, VReg_64, VGPR_32), 1, isFP>;
let VDataDwords = !if(isCmpSwap, 4, 2) in		let VDataDwords = !if(isCmpSwap, 4, 2) in
defm _V2 : MIMG_Atomic_Addr_Helper_m <op, asm, !if(isCmpSwap, VReg_128, VReg_64), 0, isFP>;		defm _V2 : MIMG_Atomic_Addr_Helper_m <op, asm, !if(isCmpSwap, VReg_128, VReg_64), 0, isFP>;
}		}
		} // End IsAtomicRet = 1
}		}

class MIMG_Sampler_Helper <mimgopc op, string asm, RegisterClass dst_rc,		class MIMG_Sampler_Helper <mimgopc op, string asm, RegisterClass dst_rc,
RegisterClass src_rc, string dns="">		RegisterClass src_rc, string dns="">
: MIMG_gfx6789 <op.BASE, (outs dst_rc:$vdata), dns> {		: MIMG_gfx6789 <op.BASE, (outs dst_rc:$vdata), dns> {
let InOperandList = !con((ins src_rc:$vaddr, SReg_256:$srsrc, SReg_128:$ssamp,		let InOperandList = !con((ins src_rc:$vaddr, SReg_256:$srsrc, SReg_128:$ssamp,
DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,		DMask:$dmask, UNorm:$unorm, GLC:$glc, SLC:$slc,
R128A16:$r128, TFE:$tfe, LWE:$lwe, DA:$da),		R128A16:$r128, TFE:$tfe, LWE:$lwe, DA:$da),
▲ Show 20 Lines • Show All 452 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIDefines.h

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	enum : uint64_t {

// Is a MFMA instruction.		// Is a MFMA instruction.
IsMAI = UINT64_C(1) << 54,		IsMAI = UINT64_C(1) << 54,

// Is a DOT instruction.		// Is a DOT instruction.
IsDOT = UINT64_C(1) << 55,		IsDOT = UINT64_C(1) << 55,

// FLAT instruction accesses FLAT_SCRATCH segment.		// FLAT instruction accesses FLAT_SCRATCH segment.
IsFlatScratch = UINT64_C(1) << 56		IsFlatScratch = UINT64_C(1) << 56,

		// Atomic without return.
		foadUnsubmitted Done Reply Inline Actions This one lacks a comment. Can you spell it "IsAtomicNoRet"? I don't think we use the "rtn"/"nortn" spelling anywhere else. foad: This one lacks a comment. Can you spell it "IsAtomicNoRet"? I don't think we use the…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Actually we are using "rtn" in the td files. I was looking what do we use more, "ret" or "rtn" and got the impression "rtn" is used more often. I do not have a strong preference though. rampitec: Actually we are using "rtn" in the td files. I was looking what do we use more, "ret" or "rtn"…
		IsAtomicNoRet = UINT64_C(1) << 57,

		// Atomic with return.
		foadUnsubmitted Done Reply Inline Actions "IsAtomicRet"? foad: "IsAtomicRet"?
		IsAtomicRet = UINT64_C(1) << 58
};		};

// v_cmp_class_* etc. use a 10-bit mask for what operation is checked.		// v_cmp_class_* etc. use a 10-bit mask for what operation is checked.
// The result is true if any of these tests are true.		// The result is true if any of these tests are true.
enum ClassFlags : unsigned {		enum ClassFlags : unsigned {
S_NAN = 1 << 0, // Signaling NaN		S_NAN = 1 << 0, // Signaling NaN
Q_NAN = 1 << 1, // Quiet NaN		Q_NAN = 1 << 1, // Quiet NaN
N_INFINITY = 1 << 2, // Negative infinity		N_INFINITY = 1 << 2, // Negative infinity
▲ Show 20 Lines • Show All 754 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp

	Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	// There no sense to create store clauses, they do not define anything,			// There no sense to create store clauses, they do not define anything,
	// thus there is nothing to set early-clobber.			// thus there is nothing to set early-clobber.
	static bool isValidClauseInst(const MachineInstr &MI, bool IsVMEMClause) {			static bool isValidClauseInst(const MachineInstr &MI, bool IsVMEMClause) {
	assert(!MI.isDebugInstr() && "debug instructions should not reach here");			assert(!MI.isDebugInstr() && "debug instructions should not reach here");
	if (MI.isBundled())			if (MI.isBundled())
	return false;			return false;
	if (!MI.mayLoad() \|\| MI.mayStore())			if (!MI.mayLoad() \|\| MI.mayStore())
	return false;			return false;
	if (AMDGPU::getAtomicNoRetOp(MI.getOpcode()) != -1 \|\|			if (SIInstrInfo::isAtomic(MI))
				foadUnsubmitted Not Done Reply Inline Actions I think this is dead too since it is covered by the mayStore just above. foad: I think this is dead too since it is covered by the mayStore just above.
				rampitecAuthorUnsubmitted Done Reply Inline Actions There are atomic loads. Although these are not marked as atomics, which may be a problem. As far as I understand this is just a load with glc. rampitec: There are atomic loads. Although these are not marked as atomics, which may be a problem. As…
				t-tyeUnsubmitted Not Done Reply Inline Actions There are atomic loads. Although these are not marked as atomics, which may be a problem. As far as I understand this is just a load with glc. Not all atomics are read-modify-write. There are atomics loads, atmic stores, and rmw atomics that do not consume the result. t-tye: > There are atomic loads. Although these are not marked as atomics, which may be a problem. As…
	AMDGPU::getAtomicRetOp(MI.getOpcode()) != -1)
	return false;			return false;
	if (IsVMEMClause && !isVMEMClauseInst(MI))			if (IsVMEMClause && !isVMEMClauseInst(MI))
	return false;			return false;
	if (!IsVMEMClause && !isSMEMClauseInst(MI))			if (!IsVMEMClause && !isSMEMClauseInst(MI))
	return false;			return false;
	// If this is a load instruction where the result has been coalesced with an operand, then we cannot clause it.			// If this is a load instruction where the result has been coalesced with an operand, then we cannot clause it.
	for (const MachineOperand &ResMO : MI.defs()) {			for (const MachineOperand &ResMO : MI.defs()) {
	Register ResReg = ResMO.getReg();			Register ResReg = ResMO.getReg();
	▲ Show 20 Lines • Show All 332 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp

Show First 20 Lines • Show All 532 Lines • ▼ Show 20 Lines	if (TII->isDS(Inst) && (Inst.mayStore() \|\| Inst.mayLoad())) {
}		}
if (AMDGPU::getNamedOperandIdx(Inst.getOpcode(),		if (AMDGPU::getNamedOperandIdx(Inst.getOpcode(),
AMDGPU::OpName::data1) != -1) {		AMDGPU::OpName::data1) != -1) {
setExpScore(&Inst, TII, TRI, MRI,		setExpScore(&Inst, TII, TRI, MRI,
AMDGPU::getNamedOperandIdx(Inst.getOpcode(),		AMDGPU::getNamedOperandIdx(Inst.getOpcode(),
AMDGPU::OpName::data1),		AMDGPU::OpName::data1),
CurrScore);		CurrScore);
}		}
} else if (AMDGPU::getAtomicNoRetOp(Inst.getOpcode()) != -1 &&		} else if (SIInstrInfo::isAtomicRet(Inst) &&
Inst.getOpcode() != AMDGPU::DS_GWS_INIT &&		Inst.getOpcode() != AMDGPU::DS_GWS_INIT &&
Inst.getOpcode() != AMDGPU::DS_GWS_SEMA_V &&		Inst.getOpcode() != AMDGPU::DS_GWS_SEMA_V &&
Inst.getOpcode() != AMDGPU::DS_GWS_SEMA_BR &&		Inst.getOpcode() != AMDGPU::DS_GWS_SEMA_BR &&
Inst.getOpcode() != AMDGPU::DS_GWS_SEMA_P &&		Inst.getOpcode() != AMDGPU::DS_GWS_SEMA_P &&
Inst.getOpcode() != AMDGPU::DS_GWS_BARRIER &&		Inst.getOpcode() != AMDGPU::DS_GWS_BARRIER &&
Inst.getOpcode() != AMDGPU::DS_APPEND &&		Inst.getOpcode() != AMDGPU::DS_APPEND &&
Inst.getOpcode() != AMDGPU::DS_CONSUME &&		Inst.getOpcode() != AMDGPU::DS_CONSUME &&
Inst.getOpcode() != AMDGPU::DS_ORDERED_COUNT) {		Inst.getOpcode() != AMDGPU::DS_ORDERED_COUNT) {
for (unsigned I = 0, E = Inst.getNumOperands(); I != E; ++I) {		for (unsigned I = 0, E = Inst.getNumOperands(); I != E; ++I) {
const MachineOperand &Op = Inst.getOperand(I);		const MachineOperand &Op = Inst.getOperand(I);
if (Op.isReg() && !Op.isDef() && TRI->isVGPR(*MRI, Op.getReg())) {		if (Op.isReg() && !Op.isDef() && TRI->isVGPR(*MRI, Op.getReg())) {
setExpScore(&Inst, TII, TRI, MRI, I, CurrScore);		setExpScore(&Inst, TII, TRI, MRI, I, CurrScore);
}		}
}		}
}		}
} else if (TII->isFLAT(Inst)) {		} else if (TII->isFLAT(Inst)) {
if (Inst.mayStore()) {		if (Inst.mayStore()) {
setExpScore(		setExpScore(
&Inst, TII, TRI, MRI,		&Inst, TII, TRI, MRI,
AMDGPU::getNamedOperandIdx(Inst.getOpcode(), AMDGPU::OpName::data),		AMDGPU::getNamedOperandIdx(Inst.getOpcode(), AMDGPU::OpName::data),
CurrScore);		CurrScore);
} else if (AMDGPU::getAtomicNoRetOp(Inst.getOpcode()) != -1) {		} else if (SIInstrInfo::isAtomicRet(Inst)) {
setExpScore(		setExpScore(
&Inst, TII, TRI, MRI,		&Inst, TII, TRI, MRI,
AMDGPU::getNamedOperandIdx(Inst.getOpcode(), AMDGPU::OpName::data),		AMDGPU::getNamedOperandIdx(Inst.getOpcode(), AMDGPU::OpName::data),
CurrScore);		CurrScore);
}		}
} else if (TII->isMIMG(Inst)) {		} else if (TII->isMIMG(Inst)) {
if (Inst.mayStore()) {		if (Inst.mayStore()) {
setExpScore(&Inst, TII, TRI, MRI, 0, CurrScore);		setExpScore(&Inst, TII, TRI, MRI, 0, CurrScore);
} else if (AMDGPU::getAtomicNoRetOp(Inst.getOpcode()) != -1) {		} else if (SIInstrInfo::isAtomicRet(Inst)) {
rampitecAuthorUnsubmitted Done Reply Inline Actions @foad do I get it right, all MIMG atomics are rtn only? MIMG has never used AtomicNoRet map, so this branch was dead. Maybe we were missing to wait for exp, maybe they all mayStore, or maybe it is really dead an unused. Potentially you will see some extra exp waits, but it didn't appear in our tests. That said I am afraid test coverage for waitcounts is not sufficient. rampitec: @foad do I get it right, all MIMG atomics are rtn only? MIMG has never used AtomicNoRet map, so…
foadUnsubmitted Not Done Reply Inline Actions That's not true, architecturally all image atomics have ret/noret versions controlled by the glc bit just like buffer instructions. But I guess in LLVM we have never bothered to select the noret instructions. foad: That's not true, architecturally all image atomics have ret/noret versions controlled by the…
foadUnsubmitted Not Done Reply Inline Actions Incidentally I don't understand this code at all. Surely all atomics are both mayLoad and mayStore, so this will have taken the "if" path on line 570? foad: Incidentally I don't understand this code at all. Surely all atomics are both mayLoad and…
foadUnsubmitted Not Done Reply Inline Actions Incidentally I don't understand this code at all. Surely all atomics are both mayLoad and mayStore, so this will have taken the "if" path on line 570? Just to prove my point, this still passes the test suite: https://reviews.llvm.org/differential/diff/323776/ foad: > Incidentally I don't understand this code at all. Surely all atomics are both mayLoad and…
rampitecAuthorUnsubmitted Done Reply Inline Actions I am afraid we do not have tests covering every path in this pass. rampitec: I am afraid we do not have tests covering every path in this pass.
foadUnsubmitted Not Done Reply Inline Actions I am saying this is dead code because every atomic is mayLoad and mayStore. Do you disagree? foad: I am saying this is dead code because every atomic is mayLoad and mayStore. Do you disagree?
rampitecAuthorUnsubmitted Done Reply Inline Actions Theoretically an atomic load can eecape that. rampitec: Theoretically an atomic load can eecape that.
t-tyeUnsubmitted Not Done Reply Inline Actions The hardware can be configured as to whether IMG instructions behave as non-IMG instruction wit respect to the waitcnts and the ordering. For HSA the hardware is configured to make them behave the same. t-tye: The hardware can be configured as to whether IMG instructions behave as non-IMG instruction wit…
setExpScore(		setExpScore(
&Inst, TII, TRI, MRI,		&Inst, TII, TRI, MRI,
AMDGPU::getNamedOperandIdx(Inst.getOpcode(), AMDGPU::OpName::data),		AMDGPU::getNamedOperandIdx(Inst.getOpcode(), AMDGPU::OpName::data),
CurrScore);		CurrScore);
}		}
} else if (TII->isMTBUF(Inst)) {		} else if (TII->isMTBUF(Inst)) {
if (Inst.mayStore()) {		if (Inst.mayStore()) {
setExpScore(&Inst, TII, TRI, MRI, 0, CurrScore);		setExpScore(&Inst, TII, TRI, MRI, 0, CurrScore);
}		}
} else if (TII->isMUBUF(Inst)) {		} else if (TII->isMUBUF(Inst)) {
if (Inst.mayStore()) {		if (Inst.mayStore()) {
setExpScore(&Inst, TII, TRI, MRI, 0, CurrScore);		setExpScore(&Inst, TII, TRI, MRI, 0, CurrScore);
} else if (AMDGPU::getAtomicNoRetOp(Inst.getOpcode()) != -1) {		} else if (SIInstrInfo::isAtomicRet(Inst)) {
setExpScore(		setExpScore(
&Inst, TII, TRI, MRI,		&Inst, TII, TRI, MRI,
AMDGPU::getNamedOperandIdx(Inst.getOpcode(), AMDGPU::OpName::data),		AMDGPU::getNamedOperandIdx(Inst.getOpcode(), AMDGPU::OpName::data),
CurrScore);		CurrScore);
}		}
} else {		} else {
if (TII->isEXP(Inst)) {		if (TII->isEXP(Inst)) {
// For export the destination registers are really temps that		// For export the destination registers are really temps that
▲ Show 20 Lines • Show All 647 Lines • ▼ Show 20 Lines	if (TII->isDS(Inst) && TII->usesLGKM_CNT(Inst)) {
assert(Inst.mayLoadOrStore());		assert(Inst.mayLoadOrStore());

int FlatASCount = 0;		int FlatASCount = 0;

if (mayAccessVMEMThroughFlat(Inst)) {		if (mayAccessVMEMThroughFlat(Inst)) {
++FlatASCount;		++FlatASCount;
if (!ST->hasVscnt())		if (!ST->hasVscnt())
ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_ACCESS, Inst);
else if (Inst.mayLoad() &&		else if (Inst.mayLoad() && !SIInstrInfo::isAtomicNoRet(Inst))
AMDGPU::getAtomicRetOp(Inst.getOpcode()) == -1)
ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_READ_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_READ_ACCESS, Inst);
else		else
ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_WRITE_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_WRITE_ACCESS, Inst);
}		}

if (mayAccessLDSThroughFlat(Inst)) {		if (mayAccessLDSThroughFlat(Inst)) {
++FlatASCount;		++FlatASCount;
ScoreBrackets->updateByEvent(TII, TRI, MRI, LDS_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, LDS_ACCESS, Inst);
Show All 11 Lines	if (FlatASCount > 1)
// TODO: get a better carve out.		// TODO: get a better carve out.
Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1 &&		Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1 &&
Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1_SC &&		Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1_SC &&
Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1_VOL &&		Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1_VOL &&
Inst.getOpcode() != AMDGPU::BUFFER_GL0_INV &&		Inst.getOpcode() != AMDGPU::BUFFER_GL0_INV &&
Inst.getOpcode() != AMDGPU::BUFFER_GL1_INV) {		Inst.getOpcode() != AMDGPU::BUFFER_GL1_INV) {
if (!ST->hasVscnt())		if (!ST->hasVscnt())
ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_ACCESS, Inst);
else if ((Inst.mayLoad() &&		else if ((Inst.mayLoad() && !SIInstrInfo::isAtomicNoRet(Inst)) \|\|
AMDGPU::getAtomicRetOp(Inst.getOpcode()) == -1) \|\|
/* IMAGE_GET_RESINFO / IMAGE_GET_LOD */		/* IMAGE_GET_RESINFO / IMAGE_GET_LOD */
(TII->isMIMG(Inst) && !Inst.mayLoad() && !Inst.mayStore()))		(TII->isMIMG(Inst) && !Inst.mayLoad() && !Inst.mayStore()))
ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_READ_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_READ_ACCESS, Inst);
else if (Inst.mayStore())		else if (Inst.mayStore())
ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_WRITE_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_WRITE_ACCESS, Inst);

if (ST->vmemWriteNeedsExpWaitcnt() &&		if (ST->vmemWriteNeedsExpWaitcnt() &&
(Inst.mayStore() \|\| AMDGPU::getAtomicNoRetOp(Inst.getOpcode()) != -1)) {		(Inst.mayStore() \|\| SIInstrInfo::isAtomicRet(Inst))) {
ScoreBrackets->updateByEvent(TII, TRI, MRI, VMW_GPR_LOCK, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, VMW_GPR_LOCK, Inst);
}		}
} else if (TII->isSMRD(Inst)) {		} else if (TII->isSMRD(Inst)) {
ScoreBrackets->updateByEvent(TII, TRI, MRI, SMEM_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, SMEM_ACCESS, Inst);
} else if (Inst.isCall()) {		} else if (Inst.isCall()) {
if (callWaitsOnFunctionReturn(Inst)) {		if (callWaitsOnFunctionReturn(Inst)) {
// Act as a wait on everything		// Act as a wait on everything
ScoreBrackets->applyWaitcnt(AMDGPU::Waitcnt::allZero(ST->hasVscnt()));		ScoreBrackets->applyWaitcnt(AMDGPU::Waitcnt::allZero(ST->hasVscnt()));
▲ Show 20 Lines • Show All 387 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrFormats.td

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	class InstSI <dag outs, dag ins, string asm = "",

// This bit indicates that this is one of DOT instructions.		// This bit indicates that this is one of DOT instructions.
field bit IsDOT = 0;		field bit IsDOT = 0;

// This field indicates that FLAT instruction accesses FLAT_SCRATCH segment.		// This field indicates that FLAT instruction accesses FLAT_SCRATCH segment.
// Must be 0 for non-FLAT instructions.		// Must be 0 for non-FLAT instructions.
field bit IsFlatScratch = 0;		field bit IsFlatScratch = 0;

		// Atomic without a return.
		field bit IsAtomicNoRet = 0;

		// Atomic with return.
		field bit IsAtomicRet = 0;

// These need to be kept in sync with the enum in SIInstrFlags.		// These need to be kept in sync with the enum in SIInstrFlags.
let TSFlags{0} = SALU;		let TSFlags{0} = SALU;
let TSFlags{1} = VALU;		let TSFlags{1} = VALU;

let TSFlags{2} = SOP1;		let TSFlags{2} = SOP1;
let TSFlags{3} = SOP2;		let TSFlags{3} = SOP2;
let TSFlags{4} = SOPC;		let TSFlags{4} = SOPC;
let TSFlags{5} = SOPK;		let TSFlags{5} = SOPK;
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	class InstSI <dag outs, dag ins, string asm = "",
let TSFlags{53} = FPAtomic;		let TSFlags{53} = FPAtomic;

let TSFlags{54} = IsMAI;		let TSFlags{54} = IsMAI;

let TSFlags{55} = IsDOT;		let TSFlags{55} = IsDOT;

let TSFlags{56} = IsFlatScratch;		let TSFlags{56} = IsFlatScratch;

		let TSFlags{57} = IsAtomicNoRet;

		let TSFlags{58} = IsAtomicRet;

let SchedRW = [Write32Bit];		let SchedRW = [Write32Bit];

let AsmVariantName = AMDGPUAsmVariants.Default;		let AsmVariantName = AMDGPUAsmVariants.Default;

// Avoid changing source registers in a way that violates constant bus read limitations.		// Avoid changing source registers in a way that violates constant bus read limitations.
let hasExtraSrcRegAllocReq = !or(VOP1, VOP2, VOP3, VOPC, SDWA, VALU);		let hasExtraSrcRegAllocReq = !or(VOP1, VOP2, VOP3, VOPC, SDWA, VALU);
}		}

▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 532 Lines • ▼ Show 20 Lines	public:
static bool isEXP(const MachineInstr &MI) {		static bool isEXP(const MachineInstr &MI) {
return MI.getDesc().TSFlags & SIInstrFlags::EXP;		return MI.getDesc().TSFlags & SIInstrFlags::EXP;
}		}

bool isEXP(uint16_t Opcode) const {		bool isEXP(uint16_t Opcode) const {
return get(Opcode).TSFlags & SIInstrFlags::EXP;		return get(Opcode).TSFlags & SIInstrFlags::EXP;
}		}

		static bool isAtomicNoRet(const MachineInstr &MI) {
		return MI.getDesc().TSFlags & SIInstrFlags::IsAtomicNoRet;
		}

		bool isAtomicNoRet(uint16_t Opcode) const {
		return get(Opcode).TSFlags & SIInstrFlags::IsAtomicNoRet;
		}

		static bool isAtomicRet(const MachineInstr &MI) {
		return MI.getDesc().TSFlags & SIInstrFlags::IsAtomicRet;
		}

		bool isAtomicRet(uint16_t Opcode) const {
		return get(Opcode).TSFlags & SIInstrFlags::IsAtomicRet;
		}

		static bool isAtomic(const MachineInstr &MI) {
		foadUnsubmitted Not Done Reply Inline Actions I am surprised that there isn't already a MachineInstr::IsAtomic method. foad: I am surprised that there isn't already a MachineInstr::IsAtomic method.
		rampitecAuthorUnsubmitted Done Reply Inline Actions Me either. rampitec: Me either.
		return MI.getDesc().TSFlags & (SIInstrFlags::IsAtomicRet \|
		SIInstrFlags::IsAtomicNoRet);
		}

		bool isAtomic(uint16_t Opcode) const {
		return get(Opcode).TSFlags & (SIInstrFlags::IsAtomicRet \|
		SIInstrFlags::IsAtomicNoRet);
		}

static bool isWQM(const MachineInstr &MI) {		static bool isWQM(const MachineInstr &MI) {
return MI.getDesc().TSFlags & SIInstrFlags::WQM;		return MI.getDesc().TSFlags & SIInstrFlags::WQM;
}		}

bool isWQM(uint16_t Opcode) const {		bool isWQM(uint16_t Opcode) const {
return get(Opcode).TSFlags & SIInstrFlags::WQM;		return get(Opcode).TSFlags & SIInstrFlags::WQM;
}		}

▲ Show 20 Lines • Show All 612 Lines • ▼ Show 20 Lines	namespace AMDGPU {
/// \returns \p Opcode if it is an Addr64 opcode, otherwise -1.		/// \returns \p Opcode if it is an Addr64 opcode, otherwise -1.
LLVM_READONLY		LLVM_READONLY
int getIfAddr64Inst(uint16_t Opcode);		int getIfAddr64Inst(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
int getMUBUFNoLdsInst(uint16_t Opcode);		int getMUBUFNoLdsInst(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
int getAtomicRetOp(uint16_t Opcode);

LLVM_READONLY
int getAtomicNoRetOp(uint16_t Opcode);		int getAtomicNoRetOp(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
int getSOPKOp(uint16_t Opcode);		int getSOPKOp(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
int getGlobalSaddrOp(uint16_t Opcode);		int getGlobalSaddrOp(uint16_t Opcode);

Show All 38 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.td

	Show First 20 Lines • Show All 2,402 Lines • ▼ Show 20 Lines
	def getMUBUFNoLdsInst : InstrMapping {			def getMUBUFNoLdsInst : InstrMapping {
	let FilterClass = "MUBUFLdsTable";			let FilterClass = "MUBUFLdsTable";
	let RowFields = ["OpName"];			let RowFields = ["OpName"];
	let ColFields = ["IsLds"];			let ColFields = ["IsLds"];
	let KeyCol = ["1"];			let KeyCol = ["1"];
	let ValueCols = [["0"]];			let ValueCols = [["0"]];
	}			}

	// Maps an atomic opcode to its version with a return value.
	def getAtomicRetOp : InstrMapping {
	let FilterClass = "AtomicNoRet";
	let RowFields = ["NoRetOp"];
	let ColFields = ["IsRet"];
	let KeyCol = ["0"];
	let ValueCols = [["1"]];
	}

	// Maps an atomic opcode to its returnless version.			// Maps an atomic opcode to its returnless version.
	def getAtomicNoRetOp : InstrMapping {			def getAtomicNoRetOp : InstrMapping {
	let FilterClass = "AtomicNoRet";			let FilterClass = "AtomicNoRet";
	let RowFields = ["NoRetOp"];			let RowFields = ["NoRetOp"];
	let ColFields = ["IsRet"];			let ColFields = ["IsRet"];
	let KeyCol = ["1"];			let KeyCol = ["1"];
	let ValueCols = [["0"]];			let ValueCols = [["0"]];
	}			}
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp

Show First 20 Lines • Show All 449 Lines • ▼ Show 20 Lines	private:
std::unique_ptr<SICacheControl> CC = nullptr;		std::unique_ptr<SICacheControl> CC = nullptr;

/// List of atomic pseudo instructions.		/// List of atomic pseudo instructions.
std::list<MachineBasicBlock::iterator> AtomicPseudoMIs;		std::list<MachineBasicBlock::iterator> AtomicPseudoMIs;

/// Return true iff instruction \p MI is a atomic instruction that		/// Return true iff instruction \p MI is a atomic instruction that
/// returns a result.		/// returns a result.
bool isAtomicRet(const MachineInstr &MI) const {		bool isAtomicRet(const MachineInstr &MI) const {
return AMDGPU::getAtomicNoRetOp(MI.getOpcode()) != -1;		return SIInstrInfo::isAtomicRet(MI);
}		}

/// Removes all processed atomic pseudo instructions from the current		/// Removes all processed atomic pseudo instructions from the current
/// function. Returns true if current function is modified, false otherwise.		/// function. Returns true if current function is modified, false otherwise.
bool removeAtomicPseudoMIs();		bool removeAtomicPseudoMIs();

/// Expands load operation \p MI. Returns true if instructions are		/// Expands load operation \p MI. Returns true if instructions are
/// added/deleted or \p MI is modified, false otherwise.		/// added/deleted or \p MI is modified, false otherwise.
▲ Show 20 Lines • Show All 1,009 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll

	Show First 20 Lines • Show All 589 Lines • ▼ Show 20 Lines
	;			;
	; GFX10-WGP-LABEL: local_agent_acquire_atomicrmw:			; GFX10-WGP-LABEL: local_agent_acquire_atomicrmw:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_agent_acquire_atomicrmw:			; GFX10-CU-LABEL: local_agent_acquire_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0			; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0
	▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_agent_acq_rel_atomicrmw:			; GFX10-CU-LABEL: local_agent_acq_rel_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0			; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_agent_seq_cst_atomicrmw:			; GFX10-CU-LABEL: local_agent_seq_cst_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0			; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	;			;
	; GFX10-WGP-LABEL: local_agent_acquire_ret_atomicrmw:			; GFX10-WGP-LABEL: local_agent_acquire_ret_atomicrmw:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: ds_write_b32 v0, v1			; GFX10-WGP-NEXT: ds_write_b32 v0, v1
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_agent_acquire_ret_atomicrmw:			; GFX10-CU-LABEL: local_agent_acquire_ret_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: ds_write_b32 v0, v1			; GFX10-WGP-NEXT: ds_write_b32 v0, v1
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_agent_acq_rel_ret_atomicrmw:			; GFX10-CU-LABEL: local_agent_acq_rel_ret_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: ds_write_b32 v0, v1			; GFX10-WGP-NEXT: ds_write_b32 v0, v1
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_agent_seq_cst_ret_atomicrmw:			; GFX10-CU-LABEL: local_agent_seq_cst_ret_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 3,550 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll

	Show First 20 Lines • Show All 589 Lines • ▼ Show 20 Lines
	;			;
	; GFX10-WGP-LABEL: local_system_acquire_atomicrmw:			; GFX10-WGP-LABEL: local_system_acquire_atomicrmw:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_system_acquire_atomicrmw:			; GFX10-CU-LABEL: local_system_acquire_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0			; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0
	▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_system_acq_rel_atomicrmw:			; GFX10-CU-LABEL: local_system_acq_rel_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0			; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_system_seq_cst_atomicrmw:			; GFX10-CU-LABEL: local_system_seq_cst_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0			; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	;			;
	; GFX10-WGP-LABEL: local_system_acquire_ret_atomicrmw:			; GFX10-WGP-LABEL: local_system_acquire_ret_atomicrmw:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: ds_write_b32 v0, v1			; GFX10-WGP-NEXT: ds_write_b32 v0, v1
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_system_acquire_ret_atomicrmw:			; GFX10-CU-LABEL: local_system_acquire_ret_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: ds_write_b32 v0, v1			; GFX10-WGP-NEXT: ds_write_b32 v0, v1
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_system_acq_rel_ret_atomicrmw:			; GFX10-CU-LABEL: local_system_acq_rel_ret_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: ds_write_b32 v0, v1			; GFX10-WGP-NEXT: ds_write_b32 v0, v1
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_system_seq_cst_ret_atomicrmw:			; GFX10-CU-LABEL: local_system_seq_cst_ret_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 3,550 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll

	Show First 20 Lines • Show All 589 Lines • ▼ Show 20 Lines
	;			;
	; GFX10-WGP-LABEL: local_workgroup_acquire_atomicrmw:			; GFX10-WGP-LABEL: local_workgroup_acquire_atomicrmw:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_workgroup_acquire_atomicrmw:			; GFX10-CU-LABEL: local_workgroup_acquire_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0			; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0
	▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_workgroup_acq_rel_atomicrmw:			; GFX10-CU-LABEL: local_workgroup_acq_rel_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0			; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v0, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_workgroup_seq_cst_atomicrmw:			; GFX10-CU-LABEL: local_workgroup_seq_cst_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0			; GFX10-CU-NEXT: v_mov_b32_e32 v0, s0
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	;			;
	; GFX10-WGP-LABEL: local_workgroup_acquire_ret_atomicrmw:			; GFX10-WGP-LABEL: local_workgroup_acquire_ret_atomicrmw:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: ds_write_b32 v0, v1			; GFX10-WGP-NEXT: ds_write_b32 v0, v1
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_workgroup_acquire_ret_atomicrmw:			; GFX10-CU-LABEL: local_workgroup_acquire_ret_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: ds_write_b32 v0, v1			; GFX10-WGP-NEXT: ds_write_b32 v0, v1
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_workgroup_acq_rel_ret_atomicrmw:			; GFX10-CU-LABEL: local_workgroup_acq_rel_ret_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0			; GFX10-WGP-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1			; GFX10-WGP-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1			; GFX10-WGP-NEXT: ds_wrxchg_rtn_b32 v1, v0, v1
	; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: ds_write_b32 v0, v1			; GFX10-WGP-NEXT: ds_write_b32 v0, v1
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: local_workgroup_seq_cst_ret_atomicrmw:			; GFX10-CU-LABEL: local_workgroup_seq_cst_ret_atomicrmw:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 3,550 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtnClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 323811

llvm/lib/Target/AMDGPU/BUFInstructions.td

llvm/lib/Target/AMDGPU/DSInstructions.td

llvm/lib/Target/AMDGPU/FLATInstructions.td

llvm/lib/Target/AMDGPU/MIMGInstructions.td

llvm/lib/Target/AMDGPU/SIDefines.h

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp

llvm/lib/Target/AMDGPU/SIInstrFormats.td

llvm/lib/Target/AMDGPU/SIInstrInfo.h

llvm/lib/Target/AMDGPU/SIInstrInfo.td

llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp

llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll

llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll

llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll

[AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn
ClosedPublic