This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11952–11957	Is this actually target specific? I thought this might have been all targets
11954–11955	Isn't there some scope greater or equal function? Should avoid directly referencing one-as

rampitec added inline comments.Mar 5 2021, 5:18 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11952–11957	Yes, it is specific to this and only this target.
11954–11955	It is, but it is located in the MMI whcih is not yet available here. In fact I don't even have MF yet to query it. Anyway since it is specific to the only target and that target does not have a higher scope I suppose it shall be OK.

arsenm added inline comments.Mar 5 2021, 5:24 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	Can it be moved out of the MMI? That sounds like a weird place for it

rampitec added inline comments.Mar 5 2021, 5:27 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	Any good ideas where to? It needs an LLVMContext, so definitely something which has access to Module.

arsenm added inline comments.Mar 5 2021, 5:37 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	It seems like its a target specific context, its own thing separate from MMI. I guess we don't have anything like that now, but I would think it would look like a function returning a struct initialized on the first call

rampitec added inline comments.Mar 5 2021, 5:42 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	Technically it depends on the target (scope ordering) and not on a subtarget. At least so far and at least so far we do not envision so drastic changes that would require a subtarget differentiation of the scopes. I guess we would resist as much as we can if that is proposed. So technically it should not need MF, only the Module.

t-tye requested changes to this revision.Mar 5 2021, 5:43 PM

t-tye added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	This needs to happen regardless of scope. We implement these operations as rmw atomic instructions regardless of the scope requested, so they will always be forwarded to the L2. If the memory address happens to have an MTYPE that causes them not to happen in the L2, then the expansion must happen. Since the compiler does not know what memory may be being accessed the expansion must happen for all scopes for all accesses. Unsafe-atomics can relax this for scopes <=agent. It is a promise that such atomics will never be to memory that will not be cached in the L2.

This revision now requires changes to proceed.Mar 5 2021, 5:43 PM

t-tye added inline comments.Mar 5 2021, 5:46 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	Scopes are target specific. Whether they are hierarchical is target specific. I believe there are target functions to query if the target supports scope inclusion, and will compare them. The SIMemoryLegalizer is using them to determine the memory properties of atomic instructions.

rampitec added inline comments.Mar 5 2021, 5:49 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	That's exactly what's written in the code. If unsafe atomics are not enabled it will bail to CAS two lines before. What exactly do you want to change here?

rampitec added inline comments.Mar 5 2021, 5:52 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	"scopes are target specific". And? What exactly in the statement they "are target specific" precludes them from being target specific?

t-tye added inline comments.Mar 5 2021, 6:04 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	Sounded like a question was being asked about scopes being hierarchical so just pointing to SIMemoryLegalizer that does use that concept. But reading the code more closely I see that that is not relevant here.
11954–11955	Sorry, I misread the code so my comment can be ignored. But I did have a question about why the address space is being considered in deciding if expansion is needed.

rampitec added inline comments.Mar 5 2021, 6:10 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	It is more about if it is target specific, or subtarget specific. So far it is not subtarget specific, I hope it will stay that way.
11954–11955	That because of the ISA. If we don't have needed instructions we need to expand it. Compare exchange is universally available, but specific instructions only in some address spaces.

t-tye added inline comments.Mar 5 2021, 6:19 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	So cmpxchg is only available for FLAT address space? So what happens for GLOBAL address space, doesn't that have the same issue?

A side note: one in fact needs to be extremely lucky or insistent to get these instructions. First a proper C(++) atomic has to be used. Then denorm mode must match. Then usafe atomics option has to be used. A return value must be ignored on gfx908. Now the scope must be non-default and less than system. I wander if anyone will ever use it at all. Stars really need to align for someone to get it, and then he/she must be careful not to violate all these promises. Oh, well.

rampitec added inline comments.Mar 5 2021, 6:27 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	Most of the cases end up with cmpxchg, which is universally available in isa and safe. As I said, stars have to align for someone to get it in a form different from a CAS loop.

rampitec added inline comments.Mar 5 2021, 6:31 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	Comment is ignored, granted. What about "revision needs change" status?

Discussed offline.

This revision now requires review to proceed.Mar 5 2021, 6:44 PM

Herald added a subscriber: t-tye. · View Herald TranscriptMar 5 2021, 6:44 PM

rampitec added inline comments.Mar 5 2021, 11:15 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	In general I think all of this discussion about avoiding directly accessing "one-as" roots into the the issue that our scopes are strings and not standard symbolic enums and all the attached frustration. I completely agree, but this is really much bigger than this w/a.

t-tye added inline comments.Mar 6 2021, 10:32 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	What about "revision needs change" status? I tried to remove the "revision needs change" status, did I succeed?

rampitec added inline comments.Mar 6 2021, 10:35 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
11954–11955	Yep, thank you!

Harbormaster completed remote builds in B92414: Diff 328669.Mar 6 2021, 11:38 AM

I am looking where to move syncscope initialization and cannot find a place. I do not think I can just initialize it on a first query and cache it because it depends on the LLVMContext. I have also double checked that MF is not created yet and MMI is not created yet too.

rampitec added a child revision: D98221: [AMDGPU] Disable SCC bit on fp atomics.Mar 8 2021, 4:24 PM

In D98085#2609265, @rampitec wrote:

I am looking where to move syncscope initialization and cannot find a place. I do not think I can just initialize it on a first query and cache it because it depends on the LLVMContext. I have also double checked that MF is not created yet and MMI is not created yet too.

Note what would happen if I cache this value at the first call:

SyncScope::ID LLVMContextImpl::getOrInsertSyncScopeID(StringRef SSN) {
  auto NewSSID = SSC.size();
  assert(NewSSID < std::numeric_limits<SyncScope::ID>::max() &&
         "Hit the maximum number of synchronization scopes allowed!");
  return SSC.insert(std::make_pair(SSN, SyncScope::ID(NewSSID))).first->second;
}

It will lock numeric value depending on the call history of getOrInsertSyncScopeID(). Caching without mapping to a context will prohibit any part of the compiler to ever call getOrInsertSyncScopeID() and then reuse AMDGPU BE. If the same instance of SIISelLowering will be called with a different module with a different Context cache may contain a wrong value. Besides even that depends on the internal implementation of the Context. AMDGPUMachineModuleInfo is fine do it because it has module context, but anything before MF is created not.

In D98085#2612608, @rampitec wrote:
In D98085#2609265, @rampitec wrote:

I am looking where to move syncscope initialization and cannot find a place. I do not think I can just initialize it on a first query and cache it because it depends on the LLVMContext. I have also double checked that MF is not created yet and MMI is not created yet too.

Note what would happen if I cache this value at the first call:
SyncScope::ID LLVMContextImpl::getOrInsertSyncScopeID(StringRef SSN) {
  auto NewSSID = SSC.size();
  assert(NewSSID < std::numeric_limits<SyncScope::ID>::max() &&
         "Hit the maximum number of synchronization scopes allowed!");
  return SSC.insert(std::make_pair(SSN, SyncScope::ID(NewSSID))).first->second;
}
It will lock numeric value depending on the call history of getOrInsertSyncScopeID(). Caching without mapping to a context will prohibit any part of the compiler to ever call getOrInsertSyncScopeID() and then reuse AMDGPU BE. If the same instance of SIISelLowering will be called with a different module with a different Context cache may contain a wrong value. Besides even that depends on the internal implementation of the Context. AMDGPUMachineModuleInfo is fine do it because it has module context, but anything before MF is created not.

I didn't mean the LLVMContext, I mean a new target context which would be a new concept. That doesn't necessarily belong in this patch, but I do think it would be better

In D98085#2613814, @arsenm wrote:

I didn't mean the LLVMContext, I mean a new target context which would be a new concept. That doesn't necessarily belong in this patch, but I do think it would be better

I completely agree, and this really does not belong to this patch.

In D98085#2614506, @rampitec wrote:

In D98085#2613814, @arsenm wrote:

I didn't mean the LLVMContext, I mean a new target context which would be a new concept. That doesn't necessarily belong in this patch, but I do think it would be better

I completely agree, and this really does not belong to this patch.

Probably something like this: D98304?

arsenm accepted this revision.Mar 10 2021, 11:54 AM

This revision is now accepted and ready to land.Mar 10 2021, 11:54 AM

Closed by commit rG574a9dabc63b: [AMDGPU] Always expand system scope fp atomics on gfx90a (authored by rampitec). · Explain WhyMar 10 2021, 12:35 PM

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rG574a9dabc63b: [AMDGPU] Always expand system scope fp atomics on gfx90a.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIISelLowering.cpp

8 lines

test/

CodeGen/

AMDGPU/

fp64-atomics-gfx90a.ll

271 lines

global-atomics-fp.ll

194 lines

Diff 329738

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,943 Lines • ▼ Show 20 Lines	case AtomicRMWInst::FAdd: {

if ((AS == AMDGPUAS::GLOBAL_ADDRESS \|\| AS == AMDGPUAS::FLAT_ADDRESS) &&		if ((AS == AMDGPUAS::GLOBAL_ADDRESS \|\| AS == AMDGPUAS::FLAT_ADDRESS) &&
Subtarget->hasAtomicFaddInsts()) {		Subtarget->hasAtomicFaddInsts()) {
if (!fpModeMatchesGlobalFPAtomicMode(RMW) \|\|		if (!fpModeMatchesGlobalFPAtomicMode(RMW) \|\|
RMW->getFunction()->getFnAttribute("amdgpu-unsafe-fp-atomics")		RMW->getFunction()->getFnAttribute("amdgpu-unsafe-fp-atomics")
.getValueAsString() != "true")		.getValueAsString() != "true")
return AtomicExpansionKind::CmpXChg;		return AtomicExpansionKind::CmpXChg;

if (Subtarget->hasGFX90AInsts())		if (Subtarget->hasGFX90AInsts()) {
		auto SSID = RMW->getSyncScopeID();
		if (SSID == SyncScope::System \|\|
		SSID == RMW->getContext().getOrInsertSyncScopeID("one-as"))
		arsenmUnsubmitted Not Done Reply Inline Actions Isn't there some scope greater or equal function? Should avoid directly referencing one-as arsenm: Isn't there some scope greater or equal function? Should avoid directly referencing one-as
		rampitecAuthorUnsubmitted Done Reply Inline Actions It is, but it is located in the MMI whcih is not yet available here. In fact I don't even have MF yet to query it. Anyway since it is specific to the only target and that target does not have a higher scope I suppose it shall be OK. rampitec: It is, but it is located in the MMI whcih is not yet available here. In fact I don't even have…
		arsenmUnsubmitted Not Done Reply Inline Actions Can it be moved out of the MMI? That sounds like a weird place for it arsenm: Can it be moved out of the MMI? That sounds like a weird place for it
		rampitecAuthorUnsubmitted Done Reply Inline Actions Any good ideas where to? It needs an LLVMContext, so definitely something which has access to Module. rampitec: Any good ideas where to? It needs an LLVMContext, so definitely something which has access to…
		arsenmUnsubmitted Not Done Reply Inline Actions It seems like its a target specific context, its own thing separate from MMI. I guess we don't have anything like that now, but I would think it would look like a function returning a struct initialized on the first call arsenm: It seems like its a target specific context, its own thing separate from MMI. I guess we don't…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Technically it depends on the target (scope ordering) and not on a subtarget. At least so far and at least so far we do not envision so drastic changes that would require a subtarget differentiation of the scopes. I guess we would resist as much as we can if that is proposed. So technically it should not need MF, only the Module. rampitec: Technically it depends on the target (scope ordering) and not on a subtarget. At least so far…
		t-tyeUnsubmitted Not Done Reply Inline Actions Scopes are target specific. Whether they are hierarchical is target specific. I believe there are target functions to query if the target supports scope inclusion, and will compare them. The SIMemoryLegalizer is using them to determine the memory properties of atomic instructions. t-tye: Scopes are target specific. Whether they are hierarchical is target specific. I believe there…
		rampitecAuthorUnsubmitted Done Reply Inline Actions "scopes are target specific". And? What exactly in the statement they "are target specific" precludes them from being target specific? rampitec: "scopes are target specific". And? What exactly in the statement they "are target specific"…
		t-tyeUnsubmitted Not Done Reply Inline Actions Sounded like a question was being asked about scopes being hierarchical so just pointing to SIMemoryLegalizer that does use that concept. But reading the code more closely I see that that is not relevant here. t-tye: Sounded like a question was being asked about scopes being hierarchical so just pointing to…
		rampitecAuthorUnsubmitted Done Reply Inline Actions It is more about if it is target specific, or subtarget specific. So far it is not subtarget specific, I hope it will stay that way. rampitec: It is more about if it is target specific, or subtarget specific. So far it is not subtarget…
		rampitecAuthorUnsubmitted Done Reply Inline Actions In general I think all of this discussion about avoiding directly accessing "one-as" roots into the the issue that our scopes are strings and not standard symbolic enums and all the attached frustration. I completely agree, but this is really much bigger than this w/a. rampitec: In general I think all of this discussion about avoiding directly accessing "one-as" roots into…
		t-tyeUnsubmitted Not Done Reply Inline Actions This needs to happen regardless of scope. We implement these operations as rmw atomic instructions regardless of the scope requested, so they will always be forwarded to the L2. If the memory address happens to have an MTYPE that causes them not to happen in the L2, then the expansion must happen. Since the compiler does not know what memory may be being accessed the expansion must happen for all scopes for all accesses. Unsafe-atomics can relax this for scopes <=agent. It is a promise that such atomics will never be to memory that will not be cached in the L2. t-tye: This needs to happen regardless of scope. We implement these operations as rmw atomic…
		rampitecAuthorUnsubmitted Done Reply Inline Actions That's exactly what's written in the code. If unsafe atomics are not enabled it will bail to CAS two lines before. What exactly do you want to change here? rampitec: That's exactly what's written in the code. If unsafe atomics are not enabled it will bail to…
		t-tyeUnsubmitted Not Done Reply Inline Actions Sorry, I misread the code so my comment can be ignored. But I did have a question about why the address space is being considered in deciding if expansion is needed. t-tye: Sorry, I misread the code so my comment can be ignored. But I did have a question about why…
		rampitecAuthorUnsubmitted Done Reply Inline Actions That because of the ISA. If we don't have needed instructions we need to expand it. Compare exchange is universally available, but specific instructions only in some address spaces. rampitec: That because of the ISA. If we don't have needed instructions we need to expand it. Compare…
		t-tyeUnsubmitted Not Done Reply Inline Actions So cmpxchg is only available for FLAT address space? So what happens for GLOBAL address space, doesn't that have the same issue? t-tye: So cmpxchg is only available for FLAT address space? So what happens for GLOBAL address space…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Most of the cases end up with cmpxchg, which is universally available in isa and safe. As I said, stars have to align for someone to get it in a form different from a CAS loop. rampitec: Most of the cases end up with cmpxchg, which is universally available in isa and safe. As I…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Comment is ignored, granted. What about "revision needs change" status? rampitec: Comment is ignored, granted. What about "revision needs change" status?
		t-tyeUnsubmitted Not Done Reply Inline Actions What about "revision needs change" status? I tried to remove the "revision needs change" status, did I succeed? t-tye: > What about "revision needs change" status? I tried to remove the "revision needs change"…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Yep, thank you! rampitec: Yep, thank you!
		return AtomicExpansionKind::CmpXChg;

		arsenmUnsubmitted Not Done Reply Inline Actions Is this actually target specific? I thought this might have been all targets arsenm: Is this actually target specific? I thought this might have been all targets
		rampitecAuthorUnsubmitted Done Reply Inline Actions Yes, it is specific to this and only this target. rampitec: Yes, it is specific to this and only this target.
return (Ty->isFloatTy() && AS == AMDGPUAS::FLAT_ADDRESS) ?		return (Ty->isFloatTy() && AS == AMDGPUAS::FLAT_ADDRESS) ?
AtomicExpansionKind::CmpXChg : AtomicExpansionKind::None;		AtomicExpansionKind::CmpXChg : AtomicExpansionKind::None;
		}

if (!Subtarget->hasGFX90AInsts() && AS != AMDGPUAS::GLOBAL_ADDRESS)		if (!Subtarget->hasGFX90AInsts() && AS != AMDGPUAS::GLOBAL_ADDRESS)
return AtomicExpansionKind::CmpXChg;		return AtomicExpansionKind::CmpXChg;

return RMW->use_empty() ? AtomicExpansionKind::None :		return RMW->use_empty() ? AtomicExpansionKind::None :
AtomicExpansionKind::CmpXChg;		AtomicExpansionKind::CmpXChg;
}		}

▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fp64-atomics-gfx90a.ll

Show First 20 Lines • Show All 409 Lines • ▼ Show 20 Lines	main_body:
%ret = call double @llvm.amdgcn.global.atomic.fmax.f64.p1f64.f64(double addrspace(1)* %ptr, double %data)		%ret = call double @llvm.amdgcn.global.atomic.fmax.f64.p1f64.f64(double addrspace(1)* %ptr, double %data)
ret void		ret void
}		}

define amdgpu_kernel void @global_atomic_fadd_f64_noret_pat(double addrspace(1)* %ptr) #1 {		define amdgpu_kernel void @global_atomic_fadd_f64_noret_pat(double addrspace(1)* %ptr) #1 {
; GFX90A-LABEL: global_atomic_fadd_f64_noret_pat:		; GFX90A-LABEL: global_atomic_fadd_f64_noret_pat:
; GFX90A: ; %bb.0: ; %main_body		; GFX90A: ; %bb.0: ; %main_body
; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX90A-NEXT: s_mov_b64 s[2:3], 0
		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
		; GFX90A-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
		; GFX90A-NEXT: v_pk_mov_b32 v[2:3], s[4:5], s[4:5] op_sel:[0,1]
		; GFX90A-NEXT: BB24_1: ; %atomicrmw.start
		; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
		; GFX90A-NEXT: v_mov_b32_e32 v4, 0
		; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0
		; GFX90A-NEXT: buffer_wbl2
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: global_atomic_cmpswap_x2 v[0:1], v4, v[0:3], s[0:1] glc scc
		; GFX90A-NEXT: s_waitcnt vmcnt(0)
		; GFX90A-NEXT: buffer_invl2
		; GFX90A-NEXT: s_waitcnt vmcnt(0)
		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3]
		; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
		; GFX90A-NEXT: v_pk_mov_b32 v[2:3], v[0:1], v[0:1] op_sel:[0,1]
		; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]
		; GFX90A-NEXT: s_cbranch_execnz BB24_1
		; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
		; GFX90A-NEXT: s_endpgm
		main_body:
		%ret = atomicrmw fadd double addrspace(1)* %ptr, double 4.0 seq_cst
		ret void
		}

		define amdgpu_kernel void @global_atomic_fadd_f64_noret_pat_agent(double addrspace(1)* %ptr) #1 {
		; GFX90A-LABEL: global_atomic_fadd_f64_noret_pat_agent:
		; GFX90A: ; %bb.0: ; %main_body
		; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX90A-NEXT: v_mov_b32_e32 v0, 0		; GFX90A-NEXT: v_mov_b32_e32 v0, 0
; GFX90A-NEXT: v_mov_b32_e32 v2, 0		; GFX90A-NEXT: v_mov_b32_e32 v2, 0
; GFX90A-NEXT: v_mov_b32_e32 v1, 0x40100000		; GFX90A-NEXT: v_mov_b32_e32 v1, 0x40100000
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: global_atomic_add_f64 v2, v[0:1], s[0:1]
		; GFX90A-NEXT: s_waitcnt vmcnt(0)
		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: s_endpgm
		main_body:
		%ret = atomicrmw fadd double addrspace(1)* %ptr, double 4.0 syncscope("agent") seq_cst
		ret void
		}

		define amdgpu_kernel void @global_atomic_fadd_f64_noret_pat_system(double addrspace(1)* %ptr) #1 {
		; GFX90A-LABEL: global_atomic_fadd_f64_noret_pat_system:
		; GFX90A: ; %bb.0: ; %main_body
		; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX90A-NEXT: s_mov_b64 s[2:3], 0
		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
		; GFX90A-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
		; GFX90A-NEXT: v_pk_mov_b32 v[2:3], s[4:5], s[4:5] op_sel:[0,1]
		; GFX90A-NEXT: BB26_1: ; %atomicrmw.start
		; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
		; GFX90A-NEXT: v_mov_b32_e32 v4, 0
		; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0
; GFX90A-NEXT: buffer_wbl2		; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: global_atomic_add_f64 v2, v[0:1], s[0:1] scc		; GFX90A-NEXT: global_atomic_cmpswap_x2 v[0:1], v4, v[0:3], s[0:1] glc scc
; GFX90A-NEXT: s_waitcnt vmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_invl2		; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: s_waitcnt vmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_wbinvl1_vol		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3]
		; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
		; GFX90A-NEXT: v_pk_mov_b32 v[2:3], v[0:1], v[0:1] op_sel:[0,1]
		; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]
		; GFX90A-NEXT: s_cbranch_execnz BB26_1
		; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
; GFX90A-NEXT: s_endpgm		; GFX90A-NEXT: s_endpgm
main_body:		main_body:
%ret = atomicrmw fadd double addrspace(1)* %ptr, double 4.0 seq_cst		%ret = atomicrmw fadd double addrspace(1)* %ptr, double 4.0 syncscope("one-as") seq_cst
ret void		ret void
}		}

define amdgpu_kernel void @global_atomic_fadd_f64_noret_pat_flush(double addrspace(1)* %ptr) #0 {		define amdgpu_kernel void @global_atomic_fadd_f64_noret_pat_flush(double addrspace(1)* %ptr) #0 {
; GFX90A-LABEL: global_atomic_fadd_f64_noret_pat_flush:		; GFX90A-LABEL: global_atomic_fadd_f64_noret_pat_flush:
; GFX90A: ; %bb.0: ; %main_body		; GFX90A: ; %bb.0: ; %main_body
; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX90A-NEXT: s_mov_b64 s[2:3], 0		; GFX90A-NEXT: s_mov_b64 s[2:3], 0
; GFX90A-NEXT: s_waitcnt lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0		; GFX90A-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
; GFX90A-NEXT: s_waitcnt lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NEXT: v_pk_mov_b32 v[2:3], s[4:5], s[4:5] op_sel:[0,1]		; GFX90A-NEXT: v_pk_mov_b32 v[2:3], s[4:5], s[4:5] op_sel:[0,1]
; GFX90A-NEXT: BB25_1: ; %atomicrmw.start		; GFX90A-NEXT: BB27_1: ; %atomicrmw.start
; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1		; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX90A-NEXT: v_mov_b32_e32 v4, 0		; GFX90A-NEXT: v_mov_b32_e32 v4, 0
; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0		; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0
; GFX90A-NEXT: buffer_wbl2		; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: global_atomic_cmpswap_x2 v[0:1], v4, v[0:3], s[0:1] glc scc		; GFX90A-NEXT: global_atomic_cmpswap_x2 v[0:1], v4, v[0:3], s[0:1] glc scc
; GFX90A-NEXT: s_waitcnt vmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_invl2		; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: s_waitcnt vmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_wbinvl1_vol		; GFX90A-NEXT: buffer_wbinvl1_vol
; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3]		; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3]
; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]		; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
; GFX90A-NEXT: v_pk_mov_b32 v[2:3], v[0:1], v[0:1] op_sel:[0,1]		; GFX90A-NEXT: v_pk_mov_b32 v[2:3], v[0:1], v[0:1] op_sel:[0,1]
; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]		; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]
; GFX90A-NEXT: s_cbranch_execnz BB25_1		; GFX90A-NEXT: s_cbranch_execnz BB27_1
; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end		; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
; GFX90A-NEXT: s_endpgm		; GFX90A-NEXT: s_endpgm
main_body:		main_body:
%ret = atomicrmw fadd double addrspace(1)* %ptr, double 4.0 seq_cst		%ret = atomicrmw fadd double addrspace(1)* %ptr, double 4.0 syncscope("agent") seq_cst
ret void		ret void
}		}

define double @global_atomic_fadd_f64_rtn(double addrspace(1)* %ptr, double %data) {		define double @global_atomic_fadd_f64_rtn(double addrspace(1)* %ptr, double %data) {
; GFX90A-LABEL: global_atomic_fadd_f64_rtn:		; GFX90A-LABEL: global_atomic_fadd_f64_rtn:
; GFX90A: ; %bb.0: ; %main_body		; GFX90A: ; %bb.0: ; %main_body
; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX90A-NEXT: global_atomic_add_f64 v[0:1], v[0:1], v[2:3], off glc		; GFX90A-NEXT: global_atomic_add_f64 v[0:1], v[0:1], v[2:3], off glc
; GFX90A-NEXT: s_waitcnt vmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: s_setpc_b64 s[30:31]		; GFX90A-NEXT: s_setpc_b64 s[30:31]
main_body:		main_body:
%ret = call double @llvm.amdgcn.global.atomic.fadd.f64.p1f64.f64(double addrspace(1)* %ptr, double %data)		%ret = call double @llvm.amdgcn.global.atomic.fadd.f64.p1f64.f64(double addrspace(1)* %ptr, double %data)
ret double %ret		ret double %ret
}		}

define double @global_atomic_fadd_f64_rtn_pat(double addrspace(1)* %ptr, double %data) #1 {		define double @global_atomic_fadd_f64_rtn_pat(double addrspace(1)* %ptr, double %data) #1 {
; GFX90A-LABEL: global_atomic_fadd_f64_rtn_pat:		; GFX90A-LABEL: global_atomic_fadd_f64_rtn_pat:
; GFX90A: ; %bb.0: ; %main_body		; GFX90A: ; %bb.0: ; %main_body
; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: global_load_dwordx2 v[2:3], v[0:1], off
		; GFX90A-NEXT: s_mov_b64 s[4:5], 0
		; GFX90A-NEXT: BB29_1: ; %atomicrmw.start
		; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
		; GFX90A-NEXT: s_waitcnt vmcnt(0)
		; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1]
		; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0
		; GFX90A-NEXT: buffer_wbl2
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: global_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5], off glc scc
		; GFX90A-NEXT: s_waitcnt vmcnt(0)
		; GFX90A-NEXT: buffer_invl2
		; GFX90A-NEXT: s_waitcnt vmcnt(0)
		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
		; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
		; GFX90A-NEXT: s_andn2_b64 exec, exec, s[4:5]
		; GFX90A-NEXT: s_cbranch_execnz BB29_1
		; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
		; GFX90A-NEXT: s_or_b64 exec, exec, s[4:5]
		; GFX90A-NEXT: v_mov_b32_e32 v0, v2
		; GFX90A-NEXT: v_mov_b32_e32 v1, v3
		; GFX90A-NEXT: s_setpc_b64 s[30:31]
		main_body:
		%ret = atomicrmw fadd double addrspace(1)* %ptr, double 4.0 seq_cst
		ret double %ret
		}

		define double @global_atomic_fadd_f64_rtn_pat_agent(double addrspace(1)* %ptr, double %data) #1 {
		; GFX90A-LABEL: global_atomic_fadd_f64_rtn_pat_agent:
		; GFX90A: ; %bb.0: ; %main_body
		; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX90A-NEXT: v_mov_b32_e32 v2, 0		; GFX90A-NEXT: v_mov_b32_e32 v2, 0
; GFX90A-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX90A-NEXT: v_mov_b32_e32 v3, 0x40100000
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: global_atomic_add_f64 v[0:1], v[0:1], v[2:3], off glc
		; GFX90A-NEXT: s_waitcnt vmcnt(0)
		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: s_setpc_b64 s[30:31]
		main_body:
		%ret = atomicrmw fadd double addrspace(1)* %ptr, double 4.0 syncscope("agent") seq_cst
		ret double %ret
		}

		define double @global_atomic_fadd_f64_rtn_pat_system(double addrspace(1)* %ptr, double %data) #1 {
		; GFX90A-LABEL: global_atomic_fadd_f64_rtn_pat_system:
		; GFX90A: ; %bb.0: ; %main_body
		; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: global_load_dwordx2 v[2:3], v[0:1], off
		; GFX90A-NEXT: s_mov_b64 s[4:5], 0
		; GFX90A-NEXT: BB31_1: ; %atomicrmw.start
		; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
		; GFX90A-NEXT: s_waitcnt vmcnt(0)
		; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1]
		; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0
; GFX90A-NEXT: buffer_wbl2		; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: global_atomic_add_f64 v[0:1], v[0:1], v[2:3], off glc scc		; GFX90A-NEXT: global_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5], off glc scc
; GFX90A-NEXT: s_waitcnt vmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_invl2		; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: s_waitcnt vmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_wbinvl1_vol		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
		; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
		; GFX90A-NEXT: s_andn2_b64 exec, exec, s[4:5]
		; GFX90A-NEXT: s_cbranch_execnz BB31_1
		; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
		; GFX90A-NEXT: s_or_b64 exec, exec, s[4:5]
		; GFX90A-NEXT: v_mov_b32_e32 v0, v2
		; GFX90A-NEXT: v_mov_b32_e32 v1, v3
; GFX90A-NEXT: s_setpc_b64 s[30:31]		; GFX90A-NEXT: s_setpc_b64 s[30:31]
main_body:		main_body:
%ret = atomicrmw fadd double addrspace(1)* %ptr, double 4.0 seq_cst		%ret = atomicrmw fadd double addrspace(1)* %ptr, double 4.0 syncscope("one-as") seq_cst
ret double %ret		ret double %ret
}		}

define double @global_atomic_fmax_f64_rtn(double addrspace(1)* %ptr, double %data) {		define double @global_atomic_fmax_f64_rtn(double addrspace(1)* %ptr, double %data) {
; GFX90A-LABEL: global_atomic_fmax_f64_rtn:		; GFX90A-LABEL: global_atomic_fmax_f64_rtn:
; GFX90A: ; %bb.0: ; %main_body		; GFX90A: ; %bb.0: ; %main_body
; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX90A-NEXT: global_atomic_max_f64 v[0:1], v[0:1], v[2:3], off glc		; GFX90A-NEXT: global_atomic_max_f64 v[0:1], v[0:1], v[2:3], off glc
Show All 15 Lines	main_body:
%ret = call double @llvm.amdgcn.global.atomic.fmin.f64.p1f64.f64(double addrspace(1)* %ptr, double %data)		%ret = call double @llvm.amdgcn.global.atomic.fmin.f64.p1f64.f64(double addrspace(1)* %ptr, double %data)
ret double %ret		ret double %ret
}		}

define amdgpu_kernel void @flat_atomic_fadd_f64_noret_pat(double* %ptr) #1 {		define amdgpu_kernel void @flat_atomic_fadd_f64_noret_pat(double* %ptr) #1 {
; GFX90A-LABEL: flat_atomic_fadd_f64_noret_pat:		; GFX90A-LABEL: flat_atomic_fadd_f64_noret_pat:
; GFX90A: ; %bb.0: ; %main_body		; GFX90A: ; %bb.0: ; %main_body
; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX90A-NEXT: s_mov_b64 s[2:3], 0
		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
		; GFX90A-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
		; GFX90A-NEXT: flat_load_dwordx2 v[2:3], v[0:1]
		; GFX90A-NEXT: BB34_1: ; %atomicrmw.start
		; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0
		; GFX90A-NEXT: v_pk_mov_b32 v[4:5], s[0:1], s[0:1] op_sel:[0,1]
		; GFX90A-NEXT: buffer_wbl2
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[0:1], v[4:5], v[0:3] glc scc
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: buffer_invl2
		; GFX90A-NEXT: s_waitcnt vmcnt(0)
		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3]
		; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
		; GFX90A-NEXT: v_pk_mov_b32 v[2:3], v[0:1], v[0:1] op_sel:[0,1]
		; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]
		; GFX90A-NEXT: s_cbranch_execnz BB34_1
		; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
		; GFX90A-NEXT: s_endpgm
		main_body:
		%ret = atomicrmw fadd double* %ptr, double 4.0 seq_cst
		ret void
		}

		define amdgpu_kernel void @flat_atomic_fadd_f64_noret_pat_agent(double* %ptr) #1 {
		; GFX90A-LABEL: flat_atomic_fadd_f64_noret_pat_agent:
		; GFX90A: ; %bb.0: ; %main_body
		; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX90A-NEXT: v_mov_b32_e32 v2, 0		; GFX90A-NEXT: v_mov_b32_e32 v2, 0
; GFX90A-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX90A-NEXT: v_mov_b32_e32 v3, 0x40100000
; GFX90A-NEXT: s_waitcnt lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]		; GFX90A-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: flat_atomic_add_f64 v[0:1], v[2:3]
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: s_endpgm
		main_body:
		%ret = atomicrmw fadd double* %ptr, double 4.0 syncscope("agent") seq_cst
		ret void
		}

		define amdgpu_kernel void @flat_atomic_fadd_f64_noret_pat_system(double* %ptr) #1 {
		; GFX90A-LABEL: flat_atomic_fadd_f64_noret_pat_system:
		; GFX90A: ; %bb.0: ; %main_body
		; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX90A-NEXT: s_mov_b64 s[2:3], 0
		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
		; GFX90A-NEXT: v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
		; GFX90A-NEXT: flat_load_dwordx2 v[2:3], v[0:1]
		; GFX90A-NEXT: BB36_1: ; %atomicrmw.start
		; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: v_add_f64 v[0:1], v[2:3], 4.0
		; GFX90A-NEXT: v_pk_mov_b32 v[4:5], s[0:1], s[0:1] op_sel:[0,1]
; GFX90A-NEXT: buffer_wbl2		; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: flat_atomic_add_f64 v[0:1], v[2:3] scc		; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[0:1], v[4:5], v[0:3] glc scc
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: buffer_invl2		; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: s_waitcnt vmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_wbinvl1_vol		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3]
		; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
		; GFX90A-NEXT: v_pk_mov_b32 v[2:3], v[0:1], v[0:1] op_sel:[0,1]
		; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]
		; GFX90A-NEXT: s_cbranch_execnz BB36_1
		; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
; GFX90A-NEXT: s_endpgm		; GFX90A-NEXT: s_endpgm
main_body:		main_body:
%ret = atomicrmw fadd double* %ptr, double 4.0 seq_cst		%ret = atomicrmw fadd double* %ptr, double 4.0 syncscope("one-as") seq_cst
ret void		ret void
}		}

define double @flat_atomic_fadd_f64_rtn_pat(double* %ptr) #1 {		define double @flat_atomic_fadd_f64_rtn_pat(double* %ptr) #1 {
; GFX90A-LABEL: flat_atomic_fadd_f64_rtn_pat:		; GFX90A-LABEL: flat_atomic_fadd_f64_rtn_pat:
; GFX90A: ; %bb.0: ; %main_body		; GFX90A: ; %bb.0: ; %main_body
; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: flat_load_dwordx2 v[2:3], v[0:1]
		; GFX90A-NEXT: s_mov_b64 s[4:5], 0
		; GFX90A-NEXT: BB37_1: ; %atomicrmw.start
		; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1]
		; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0
		; GFX90A-NEXT: buffer_wbl2
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5] glc scc
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: buffer_invl2
		; GFX90A-NEXT: s_waitcnt vmcnt(0)
		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
		; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
		; GFX90A-NEXT: s_andn2_b64 exec, exec, s[4:5]
		; GFX90A-NEXT: s_cbranch_execnz BB37_1
		; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
		; GFX90A-NEXT: s_or_b64 exec, exec, s[4:5]
		; GFX90A-NEXT: v_mov_b32_e32 v0, v2
		; GFX90A-NEXT: v_mov_b32_e32 v1, v3
		; GFX90A-NEXT: s_setpc_b64 s[30:31]
		main_body:
		%ret = atomicrmw fadd double* %ptr, double 4.0 seq_cst
		ret double %ret
		}

		define double @flat_atomic_fadd_f64_rtn_pat_agent(double* %ptr) #1 {
		; GFX90A-LABEL: flat_atomic_fadd_f64_rtn_pat_agent:
		; GFX90A: ; %bb.0: ; %main_body
		; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX90A-NEXT: v_mov_b32_e32 v2, 0		; GFX90A-NEXT: v_mov_b32_e32 v2, 0
; GFX90A-NEXT: v_mov_b32_e32 v3, 0x40100000		; GFX90A-NEXT: v_mov_b32_e32 v3, 0x40100000
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: flat_atomic_add_f64 v[0:1], v[0:1], v[2:3] glc
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
		; GFX90A-NEXT: s_setpc_b64 s[30:31]
		main_body:
		%ret = atomicrmw fadd double* %ptr, double 4.0 syncscope("agent") seq_cst
		ret double %ret
		}

		define double @flat_atomic_fadd_f64_rtn_pat_system(double* %ptr) #1 {
		; GFX90A-LABEL: flat_atomic_fadd_f64_rtn_pat_system:
		; GFX90A: ; %bb.0: ; %main_body
		; GFX90A-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: flat_load_dwordx2 v[2:3], v[0:1]
		; GFX90A-NEXT: s_mov_b64 s[4:5], 0
		; GFX90A-NEXT: BB39_1: ; %atomicrmw.start
		; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
		; GFX90A-NEXT: v_pk_mov_b32 v[4:5], v[2:3], v[2:3] op_sel:[0,1]
		; GFX90A-NEXT: v_add_f64 v[2:3], v[4:5], 4.0
; GFX90A-NEXT: buffer_wbl2		; GFX90A-NEXT: buffer_wbl2
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: flat_atomic_add_f64 v[0:1], v[0:1], v[2:3] glc scc		; GFX90A-NEXT: flat_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5] glc scc
; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX90A-NEXT: buffer_invl2		; GFX90A-NEXT: buffer_invl2
; GFX90A-NEXT: s_waitcnt vmcnt(0)		; GFX90A-NEXT: s_waitcnt vmcnt(0)
; GFX90A-NEXT: buffer_wbinvl1_vol		; GFX90A-NEXT: buffer_wbinvl1_vol
		; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
		; GFX90A-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
		; GFX90A-NEXT: s_andn2_b64 exec, exec, s[4:5]
		; GFX90A-NEXT: s_cbranch_execnz BB39_1
		; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
		; GFX90A-NEXT: s_or_b64 exec, exec, s[4:5]
		; GFX90A-NEXT: v_mov_b32_e32 v0, v2
		; GFX90A-NEXT: v_mov_b32_e32 v1, v3
; GFX90A-NEXT: s_setpc_b64 s[30:31]		; GFX90A-NEXT: s_setpc_b64 s[30:31]
main_body:		main_body:
%ret = atomicrmw fadd double* %ptr, double 4.0 seq_cst		%ret = atomicrmw fadd double* %ptr, double 4.0 syncscope("one-as") seq_cst
ret double %ret		ret double %ret
}		}

define amdgpu_kernel void @flat_atomic_fadd_f64_noret(double* %ptr, double %data) {		define amdgpu_kernel void @flat_atomic_fadd_f64_noret(double* %ptr, double %data) {
; GFX90A-LABEL: flat_atomic_fadd_f64_noret:		; GFX90A-LABEL: flat_atomic_fadd_f64_noret:
; GFX90A: ; %bb.0: ; %main_body		; GFX90A: ; %bb.0: ; %main_body
; GFX90A-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24		; GFX90A-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX90A-NEXT: s_waitcnt lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
define amdgpu_kernel void @local_atomic_fadd_f64_noret_pat_flush(double addrspace(3)* %ptr) #0 {		define amdgpu_kernel void @local_atomic_fadd_f64_noret_pat_flush(double addrspace(3)* %ptr) #0 {
; GFX90A-LABEL: local_atomic_fadd_f64_noret_pat_flush:		; GFX90A-LABEL: local_atomic_fadd_f64_noret_pat_flush:
; GFX90A: ; %bb.0: ; %main_body		; GFX90A: ; %bb.0: ; %main_body
; GFX90A-NEXT: s_load_dword s0, s[0:1], 0x24		; GFX90A-NEXT: s_load_dword s0, s[0:1], 0x24
; GFX90A-NEXT: s_mov_b64 s[2:3], 0		; GFX90A-NEXT: s_mov_b64 s[2:3], 0
; GFX90A-NEXT: s_waitcnt lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NEXT: v_mov_b32_e32 v0, s0		; GFX90A-NEXT: v_mov_b32_e32 v0, s0
; GFX90A-NEXT: ds_read_b64 v[0:1], v0		; GFX90A-NEXT: ds_read_b64 v[0:1], v0
; GFX90A-NEXT: BB41_1: ; %atomicrmw.start		; GFX90A-NEXT: BB49_1: ; %atomicrmw.start
; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1		; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX90A-NEXT: s_waitcnt lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NEXT: v_add_f64 v[2:3], v[0:1], 4.0		; GFX90A-NEXT: v_add_f64 v[2:3], v[0:1], 4.0
; GFX90A-NEXT: v_mov_b32_e32 v4, s0		; GFX90A-NEXT: v_mov_b32_e32 v4, s0
; GFX90A-NEXT: s_waitcnt lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NEXT: ds_cmpst_rtn_b64 v[2:3], v4, v[0:1], v[2:3]		; GFX90A-NEXT: ds_cmpst_rtn_b64 v[2:3], v4, v[0:1], v[2:3]
; GFX90A-NEXT: s_waitcnt lgkmcnt(0)		; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[0:1]		; GFX90A-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[0:1]
; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]		; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
; GFX90A-NEXT: v_pk_mov_b32 v[0:1], v[2:3], v[2:3] op_sel:[0,1]		; GFX90A-NEXT: v_pk_mov_b32 v[0:1], v[2:3], v[2:3] op_sel:[0,1]
; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]		; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]
; GFX90A-NEXT: s_cbranch_execnz BB41_1		; GFX90A-NEXT: s_cbranch_execnz BB49_1
; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end		; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
; GFX90A-NEXT: s_endpgm		; GFX90A-NEXT: s_endpgm
main_body:		main_body:
%ret = atomicrmw fadd double addrspace(3)* %ptr, double 4.0 seq_cst		%ret = atomicrmw fadd double addrspace(3)* %ptr, double 4.0 seq_cst
ret void		ret void
}		}

define double @local_atomic_fadd_f64_rtn_pat(double addrspace(3)* %ptr, double %data) #1 {		define double @local_atomic_fadd_f64_rtn_pat(double addrspace(3)* %ptr, double %data) #1 {
Show All 16 Lines

llvm/test/CodeGen/AMDGPU/global-atomics-fp.ll

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GFX908-NEXT: ; %bb.2: ; %atomicrmw.end			; GFX908-NEXT: ; %bb.2: ; %atomicrmw.end
	; GFX908-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX908-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX908-NEXT: global_store_dword v[0:1], v0, off			; GFX908-NEXT: global_store_dword v[0:1], v0, off
	; GFX908-NEXT: s_endpgm			; GFX908-NEXT: s_endpgm
	;			;
	; GFX90A-LABEL: global_atomic_fadd_ret_f32:			; GFX90A-LABEL: global_atomic_fadd_ret_f32:
	; GFX90A: ; %bb.0:			; GFX90A: ; %bb.0:
	; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX90A-NEXT: v_mov_b32_e32 v0, 0			; GFX90A-NEXT: s_mov_b64 s[2:3], 0
	; GFX90A-NEXT: v_mov_b32_e32 v1, 4.0			; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NEXT: s_load_dword s4, s[0:1], 0x0
				; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NEXT: v_mov_b32_e32 v0, s4
				; GFX90A-NEXT: BB0_1: ; %atomicrmw.start
				; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX90A-NEXT: v_mov_b32_e32 v1, v0
				; GFX90A-NEXT: v_mov_b32_e32 v2, 0
				; GFX90A-NEXT: v_add_f32_e32 v0, 4.0, v1
	; GFX90A-NEXT: buffer_wbl2			; GFX90A-NEXT: buffer_wbl2
	; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-NEXT: global_atomic_add_f32 v0, v0, v1, s[0:1] glc scc			; GFX90A-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc scc
	; GFX90A-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NEXT: buffer_invl2			; GFX90A-NEXT: buffer_invl2
	; GFX90A-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NEXT: buffer_wbinvl1_vol			; GFX90A-NEXT: buffer_wbinvl1_vol
				; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1
				; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
				; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]
				; GFX90A-NEXT: s_cbranch_execnz BB0_1
				; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
				; GFX90A-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX90A-NEXT: global_store_dword v[0:1], v0, off			; GFX90A-NEXT: global_store_dword v[0:1], v0, off
	; GFX90A-NEXT: s_endpgm			; GFX90A-NEXT: s_endpgm
	%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 seq_cst			%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 seq_cst
	store float %result, float addrspace(1)* undef			store float %result, float addrspace(1)* undef
	ret void			ret void
	}			}

	define amdgpu_kernel void @global_atomic_fadd_ret_f32_ieee(float addrspace(1)* %ptr) #2 {			define amdgpu_kernel void @global_atomic_fadd_ret_f32_ieee(float addrspace(1)* %ptr) #2 {
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1			; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1
	; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]			; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
	; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]			; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]
	; GFX90A-NEXT: s_cbranch_execnz BB1_1			; GFX90A-NEXT: s_cbranch_execnz BB1_1
	; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end			; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
	; GFX90A-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX90A-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX90A-NEXT: global_store_dword v[0:1], v0, off			; GFX90A-NEXT: global_store_dword v[0:1], v0, off
	; GFX90A-NEXT: s_endpgm			; GFX90A-NEXT: s_endpgm
	%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 seq_cst			%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("agent") seq_cst
	store float %result, float addrspace(1)* undef			store float %result, float addrspace(1)* undef
	ret void			ret void
	}			}

	define amdgpu_kernel void @global_atomic_fadd_noret_f32(float addrspace(1)* %ptr) #0 {			define amdgpu_kernel void @global_atomic_fadd_noret_f32(float addrspace(1)* %ptr) #0 {
	; GFX900-LABEL: global_atomic_fadd_noret_f32:			; GFX900-LABEL: global_atomic_fadd_noret_f32:
	; GFX900: ; %bb.0:			; GFX900: ; %bb.0:
	; GFX900-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX900-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	Show All 29 Lines
	; GFX908-NEXT: buffer_wbinvl1_vol			; GFX908-NEXT: buffer_wbinvl1_vol
	; GFX908-NEXT: s_endpgm			; GFX908-NEXT: s_endpgm
	;			;
	; GFX90A-LABEL: global_atomic_fadd_noret_f32:			; GFX90A-LABEL: global_atomic_fadd_noret_f32:
	; GFX90A: ; %bb.0:			; GFX90A: ; %bb.0:
	; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX90A-NEXT: v_mov_b32_e32 v0, 0			; GFX90A-NEXT: v_mov_b32_e32 v0, 0
	; GFX90A-NEXT: v_mov_b32_e32 v1, 4.0			; GFX90A-NEXT: v_mov_b32_e32 v1, 4.0
	; GFX90A-NEXT: buffer_wbl2
	; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-NEXT: global_atomic_add_f32 v0, v1, s[0:1] scc			; GFX90A-NEXT: global_atomic_add_f32 v0, v1, s[0:1]
	; GFX90A-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NEXT: buffer_invl2
	; GFX90A-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NEXT: buffer_wbinvl1_vol			; GFX90A-NEXT: buffer_wbinvl1_vol
	; GFX90A-NEXT: s_endpgm			; GFX90A-NEXT: s_endpgm
	%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 seq_cst			%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("agent") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @global_atomic_fadd_noret_f32_ieee(float addrspace(1)* %ptr) #2 {			define amdgpu_kernel void @global_atomic_fadd_noret_f32_ieee(float addrspace(1)* %ptr) #2 {
	; GFX900-LABEL: global_atomic_fadd_noret_f32_ieee:			; GFX900-LABEL: global_atomic_fadd_noret_f32_ieee:
	; GFX900: ; %bb.0:			; GFX900: ; %bb.0:
	; GFX900-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX900-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX900-NEXT: s_mov_b64 s[2:3], 0			; GFX900-NEXT: s_mov_b64 s[2:3], 0
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; GFX90A-NEXT: buffer_wbinvl1_vol			; GFX90A-NEXT: buffer_wbinvl1_vol
	; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1			; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1
	; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]			; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
	; GFX90A-NEXT: v_mov_b32_e32 v1, v0			; GFX90A-NEXT: v_mov_b32_e32 v1, v0
	; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]			; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]
	; GFX90A-NEXT: s_cbranch_execnz BB3_1			; GFX90A-NEXT: s_cbranch_execnz BB3_1
	; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end			; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
	; GFX90A-NEXT: s_endpgm			; GFX90A-NEXT: s_endpgm
	%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 seq_cst			%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("agent") seq_cst
				ret void
				}

				define amdgpu_kernel void @global_atomic_fadd_ret_f32_agent(float addrspace(1)* %ptr) #0 {
				; GFX900-LABEL: global_atomic_fadd_ret_f32_agent:
				; GFX900: ; %bb.0:
				; GFX900-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX900-NEXT: s_mov_b64 s[2:3], 0
				; GFX900-NEXT: s_waitcnt lgkmcnt(0)
				; GFX900-NEXT: s_load_dword s4, s[0:1], 0x0
				; GFX900-NEXT: s_waitcnt lgkmcnt(0)
				; GFX900-NEXT: v_mov_b32_e32 v0, s4
				; GFX900-NEXT: BB4_1: ; %atomicrmw.start
				; GFX900-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX900-NEXT: v_mov_b32_e32 v1, v0
				; GFX900-NEXT: v_mov_b32_e32 v2, 0
				; GFX900-NEXT: v_add_f32_e32 v0, 4.0, v1
				; GFX900-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
				; GFX900-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc
				; GFX900-NEXT: s_waitcnt vmcnt(0)
				; GFX900-NEXT: buffer_wbinvl1_vol
				; GFX900-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1
				; GFX900-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
				; GFX900-NEXT: s_andn2_b64 exec, exec, s[2:3]
				; GFX900-NEXT: s_cbranch_execnz BB4_1
				; GFX900-NEXT: ; %bb.2: ; %atomicrmw.end
				; GFX900-NEXT: s_or_b64 exec, exec, s[2:3]
				; GFX900-NEXT: global_store_dword v[0:1], v0, off
				; GFX900-NEXT: s_endpgm
				;
				; GFX908-LABEL: global_atomic_fadd_ret_f32_agent:
				; GFX908: ; %bb.0:
				; GFX908-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX908-NEXT: s_mov_b64 s[2:3], 0
				; GFX908-NEXT: s_waitcnt lgkmcnt(0)
				; GFX908-NEXT: s_load_dword s4, s[0:1], 0x0
				; GFX908-NEXT: s_waitcnt lgkmcnt(0)
				; GFX908-NEXT: v_mov_b32_e32 v0, s4
				; GFX908-NEXT: BB4_1: ; %atomicrmw.start
				; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX908-NEXT: v_mov_b32_e32 v1, v0
				; GFX908-NEXT: v_mov_b32_e32 v2, 0
				; GFX908-NEXT: v_add_f32_e32 v0, 4.0, v1
				; GFX908-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
				; GFX908-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: buffer_wbinvl1_vol
				; GFX908-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1
				; GFX908-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
				; GFX908-NEXT: s_andn2_b64 exec, exec, s[2:3]
				; GFX908-NEXT: s_cbranch_execnz BB4_1
				; GFX908-NEXT: ; %bb.2: ; %atomicrmw.end
				; GFX908-NEXT: s_or_b64 exec, exec, s[2:3]
				; GFX908-NEXT: global_store_dword v[0:1], v0, off
				; GFX908-NEXT: s_endpgm
				;
				; GFX90A-LABEL: global_atomic_fadd_ret_f32_agent:
				; GFX90A: ; %bb.0:
				; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX90A-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NEXT: v_mov_b32_e32 v1, 4.0
				; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
				; GFX90A-NEXT: global_atomic_add_f32 v0, v0, v1, s[0:1] glc
				; GFX90A-NEXT: s_waitcnt vmcnt(0)
				; GFX90A-NEXT: buffer_wbinvl1_vol
				; GFX90A-NEXT: global_store_dword v[0:1], v0, off
				; GFX90A-NEXT: s_endpgm
				%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("agent") seq_cst
				store float %result, float addrspace(1)* undef
				ret void
				}

				define amdgpu_kernel void @global_atomic_fadd_ret_f32_system(float addrspace(1)* %ptr) #0 {
				; GFX900-LABEL: global_atomic_fadd_ret_f32_system:
				; GFX900: ; %bb.0:
				; GFX900-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX900-NEXT: s_mov_b64 s[2:3], 0
				; GFX900-NEXT: s_waitcnt lgkmcnt(0)
				; GFX900-NEXT: s_load_dword s4, s[0:1], 0x0
				; GFX900-NEXT: s_waitcnt lgkmcnt(0)
				; GFX900-NEXT: v_mov_b32_e32 v0, s4
				; GFX900-NEXT: BB5_1: ; %atomicrmw.start
				; GFX900-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX900-NEXT: v_mov_b32_e32 v1, v0
				; GFX900-NEXT: v_mov_b32_e32 v2, 0
				; GFX900-NEXT: v_add_f32_e32 v0, 4.0, v1
				; GFX900-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
				; GFX900-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc
				; GFX900-NEXT: s_waitcnt vmcnt(0)
				; GFX900-NEXT: buffer_wbinvl1_vol
				; GFX900-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1
				; GFX900-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
				; GFX900-NEXT: s_andn2_b64 exec, exec, s[2:3]
				; GFX900-NEXT: s_cbranch_execnz BB5_1
				; GFX900-NEXT: ; %bb.2: ; %atomicrmw.end
				; GFX900-NEXT: s_or_b64 exec, exec, s[2:3]
				; GFX900-NEXT: global_store_dword v[0:1], v0, off
				; GFX900-NEXT: s_endpgm
				;
				; GFX908-LABEL: global_atomic_fadd_ret_f32_system:
				; GFX908: ; %bb.0:
				; GFX908-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX908-NEXT: s_mov_b64 s[2:3], 0
				; GFX908-NEXT: s_waitcnt lgkmcnt(0)
				; GFX908-NEXT: s_load_dword s4, s[0:1], 0x0
				; GFX908-NEXT: s_waitcnt lgkmcnt(0)
				; GFX908-NEXT: v_mov_b32_e32 v0, s4
				; GFX908-NEXT: BB5_1: ; %atomicrmw.start
				; GFX908-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX908-NEXT: v_mov_b32_e32 v1, v0
				; GFX908-NEXT: v_mov_b32_e32 v2, 0
				; GFX908-NEXT: v_add_f32_e32 v0, 4.0, v1
				; GFX908-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
				; GFX908-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc
				; GFX908-NEXT: s_waitcnt vmcnt(0)
				; GFX908-NEXT: buffer_wbinvl1_vol
				; GFX908-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1
				; GFX908-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
				; GFX908-NEXT: s_andn2_b64 exec, exec, s[2:3]
				; GFX908-NEXT: s_cbranch_execnz BB5_1
				; GFX908-NEXT: ; %bb.2: ; %atomicrmw.end
				; GFX908-NEXT: s_or_b64 exec, exec, s[2:3]
				; GFX908-NEXT: global_store_dword v[0:1], v0, off
				; GFX908-NEXT: s_endpgm
				;
				; GFX90A-LABEL: global_atomic_fadd_ret_f32_system:
				; GFX90A: ; %bb.0:
				; GFX90A-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX90A-NEXT: s_mov_b64 s[2:3], 0
				; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NEXT: s_load_dword s4, s[0:1], 0x0
				; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NEXT: v_mov_b32_e32 v0, s4
				; GFX90A-NEXT: BB5_1: ; %atomicrmw.start
				; GFX90A-NEXT: ; =>This Inner Loop Header: Depth=1
				; GFX90A-NEXT: v_mov_b32_e32 v1, v0
				; GFX90A-NEXT: v_mov_b32_e32 v2, 0
				; GFX90A-NEXT: v_add_f32_e32 v0, 4.0, v1
				; GFX90A-NEXT: buffer_wbl2
				; GFX90A-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
				; GFX90A-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc scc
				; GFX90A-NEXT: s_waitcnt vmcnt(0)
				; GFX90A-NEXT: buffer_invl2
				; GFX90A-NEXT: s_waitcnt vmcnt(0)
				; GFX90A-NEXT: buffer_wbinvl1_vol
				; GFX90A-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1
				; GFX90A-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
				; GFX90A-NEXT: s_andn2_b64 exec, exec, s[2:3]
				; GFX90A-NEXT: s_cbranch_execnz BB5_1
				; GFX90A-NEXT: ; %bb.2: ; %atomicrmw.end
				; GFX90A-NEXT: s_or_b64 exec, exec, s[2:3]
				; GFX90A-NEXT: global_store_dword v[0:1], v0, off
				; GFX90A-NEXT: s_endpgm
				%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("one-as") seq_cst
				store float %result, float addrspace(1)* undef
	ret void			ret void
	}			}

	define amdgpu_kernel void @global_atomic_fadd_ret_f32_wrong_subtarget(float addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @global_atomic_fadd_ret_f32_wrong_subtarget(float addrspace(1)* %ptr) #1 {
	; GCN-LABEL: global_atomic_fadd_ret_f32_wrong_subtarget:			; GCN-LABEL: global_atomic_fadd_ret_f32_wrong_subtarget:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GCN-NEXT: s_mov_b64 s[2:3], 0			; GCN-NEXT: s_mov_b64 s[2:3], 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_load_dword s4, s[0:1], 0x0			; GCN-NEXT: s_load_dword s4, s[0:1], 0x0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s4			; GCN-NEXT: v_mov_b32_e32 v0, s4
	; GCN-NEXT: BB4_1: ; %atomicrmw.start			; GCN-NEXT: BB6_1: ; %atomicrmw.start
	; GCN-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: v_mov_b32_e32 v1, v0			; GCN-NEXT: v_mov_b32_e32 v1, v0
	; GCN-NEXT: v_mov_b32_e32 v2, 0			; GCN-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: v_add_f32_e32 v0, 4.0, v1			; GCN-NEXT: v_add_f32_e32 v0, 4.0, v1
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc			; GCN-NEXT: global_atomic_cmpswap v0, v2, v[0:1], s[0:1] glc
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_wbinvl1_vol			; GCN-NEXT: buffer_wbinvl1_vol
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1			; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v0, v1
	; GCN-NEXT: s_or_b64 s[2:3], vcc, s[2:3]			; GCN-NEXT: s_or_b64 s[2:3], vcc, s[2:3]
	; GCN-NEXT: s_andn2_b64 exec, exec, s[2:3]			; GCN-NEXT: s_andn2_b64 exec, exec, s[2:3]
	; GCN-NEXT: s_cbranch_execnz BB4_1			; GCN-NEXT: s_cbranch_execnz BB6_1
	; GCN-NEXT: ; %bb.2: ; %atomicrmw.end			; GCN-NEXT: ; %bb.2: ; %atomicrmw.end
	; GCN-NEXT: s_or_b64 exec, exec, s[2:3]			; GCN-NEXT: s_or_b64 exec, exec, s[2:3]
	; GCN-NEXT: global_store_dword v[0:1], v0, off			; GCN-NEXT: global_store_dword v[0:1], v0, off
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 seq_cst			%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("agent") seq_cst
	store float %result, float addrspace(1)* undef			store float %result, float addrspace(1)* undef
	ret void			ret void
	}			}

	define amdgpu_kernel void @global_atomic_fadd_noret_f32_wrong_subtarget(float addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @global_atomic_fadd_noret_f32_wrong_subtarget(float addrspace(1)* %ptr) #1 {
	; GCN-LABEL: global_atomic_fadd_noret_f32_wrong_subtarget:			; GCN-LABEL: global_atomic_fadd_noret_f32_wrong_subtarget:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 4.0			; GCN-NEXT: v_mov_b32_e32 v1, 4.0
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: global_atomic_add_f32 v0, v1, s[0:1]			; GCN-NEXT: global_atomic_add_f32 v0, v1, s[0:1]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_wbinvl1_vol			; GCN-NEXT: buffer_wbinvl1_vol
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 seq_cst			%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("agent") seq_cst
	ret void			ret void
	}			}

	attributes #0 = { "denormal-fp-math-f32"="preserve-sign,preserve-sign" "amdgpu-unsafe-fp-atomics"="true" }			attributes #0 = { "denormal-fp-math-f32"="preserve-sign,preserve-sign" "amdgpu-unsafe-fp-atomics"="true" }
	attributes #1 = { "denormal-fp-math-f32"="preserve-sign,preserve-sign" "target-cpu"="gfx803" "target-features"="+atomic-fadd-insts" "amdgpu-unsafe-fp-atomics"="true" }			attributes #1 = { "denormal-fp-math-f32"="preserve-sign,preserve-sign" "target-cpu"="gfx803" "target-features"="+atomic-fadd-insts" "amdgpu-unsafe-fp-atomics"="true" }
	attributes #2 = { "amdgpu-unsafe-fp-atomics"="true" }			attributes #2 = { "amdgpu-unsafe-fp-atomics"="true" }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Always expand system scope fp atomics on gfx90aClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 329738

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/test/CodeGen/AMDGPU/fp64-atomics-gfx90a.ll

llvm/test/CodeGen/AMDGPU/global-atomics-fp.ll

[AMDGPU] Always expand system scope fp atomics on gfx90a
ClosedPublic