This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Add support for global atomicrmw fadd
ClosedPublic

Authored by foad on Mar 2 2021, 6:17 AM.

Download Raw Diff

Details

Reviewers

arsenm
Petar.Avramovic

Commits

rG5d0e9ddfa512: [AMDGPU][GlobalISel] Add support for global atomicrmw fadd

Summary

This includes gfx908 which only has a no-return version of the
global_atomic_add_f32 instruction, using the same hack that was
previously implemented for selecting from the
llvm.amdgcn.global.atomic.fadd intrinsic.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Mar 2 2021, 6:17 AM

Herald added subscribers: kerbowa, jfb, hiraditya and 8 others. · View Herald TranscriptMar 2 2021, 6:17 AM

foad requested review of this revision.Mar 2 2021, 6:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2021, 6:17 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

arsenm added inline comments.Mar 2 2021, 6:21 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1301	Isn't this also conditional on the denorm mode or the unsafe atomic attribute? That would need to be custom and verify those are consistent

Harbormaster completed remote builds in B91557: Diff 327430.Mar 2 2021, 7:08 AM

foad added inline comments.Mar 3 2021, 2:34 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1301	Those checks are done in SITargetLowering::shouldExpandAtomicRMWInIR for both selectiondag and globalisel.

Ping. Is it OK to rely on the atomic-expand pass having been run? That seems to be how SelectionDAG works.

In D97767#2610596, @foad wrote:

Ping. Is it OK to rely on the atomic-expand pass having been run? That seems to be how SelectionDAG works.

Yes, although some additional verification wouldn't hurt in case something decides to do something based on the legality information

LGTM, though ensuring the right mode in the lowering. We probably won't have MIR atomic expansions anytime soon

This revision is now accepted and ready to land.Mar 30 2021, 3:33 PM

This revision was landed with ongoing or failed builds.Mar 31 2021, 3:15 AM

Closed by commit rG5d0e9ddfa512: [AMDGPU][GlobalISel] Add support for global atomicrmw fadd (authored by foad). · Explain Why

This revision was automatically updated to reflect the committed changes.

foad added a commit: rG5d0e9ddfa512: [AMDGPU][GlobalISel] Add support for global atomicrmw fadd.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUInstructionSelector.h

3 lines

AMDGPUInstructionSelector.cpp

22 lines

AMDGPULegalizerInfo.cpp

6 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

legalize-atomicrmw-fadd-global.mir

22 lines

	legalize-atomicrmw-fadd-local.mir
	legalize-atomicrmw-fadd.mir

legalize-atomicrmw-fadd.mir

Diff 334393

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	private:
bool selectG_SELECT(MachineInstr &I) const;		bool selectG_SELECT(MachineInstr &I) const;
bool selectG_BRCOND(MachineInstr &I) const;		bool selectG_BRCOND(MachineInstr &I) const;
bool selectG_GLOBAL_VALUE(MachineInstr &I) const;		bool selectG_GLOBAL_VALUE(MachineInstr &I) const;
bool selectG_PTRMASK(MachineInstr &I) const;		bool selectG_PTRMASK(MachineInstr &I) const;
bool selectG_EXTRACT_VECTOR_ELT(MachineInstr &I) const;		bool selectG_EXTRACT_VECTOR_ELT(MachineInstr &I) const;
bool selectG_INSERT_VECTOR_ELT(MachineInstr &I) const;		bool selectG_INSERT_VECTOR_ELT(MachineInstr &I) const;
bool selectG_SHUFFLE_VECTOR(MachineInstr &I) const;		bool selectG_SHUFFLE_VECTOR(MachineInstr &I) const;
bool selectAMDGPU_BUFFER_ATOMIC_FADD(MachineInstr &I) const;		bool selectAMDGPU_BUFFER_ATOMIC_FADD(MachineInstr &I) const;
bool selectGlobalAtomicFaddIntrinsic(MachineInstr &I) const;		bool selectGlobalAtomicFadd(MachineInstr &I, MachineOperand &AddrOp,
		MachineOperand &DataOp) const;
bool selectBVHIntrinsic(MachineInstr &I) const;		bool selectBVHIntrinsic(MachineInstr &I) const;

std::pair<Register, unsigned> selectVOP3ModsImpl(MachineOperand &Root,		std::pair<Register, unsigned> selectVOP3ModsImpl(MachineOperand &Root,
bool AllowAbs = true) const;		bool AllowAbs = true) const;

InstructionSelector::ComplexRendererFns		InstructionSelector::ComplexRendererFns
selectVCSRC(MachineOperand &Root) const;		selectVCSRC(MachineOperand &Root) const;

▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Show First 20 Lines • Show All 1,710 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_ds_gws_sema_release_all:
return selectDSGWSIntrinsic(I, IntrinsicID);		return selectDSGWSIntrinsic(I, IntrinsicID);
case Intrinsic::amdgcn_ds_append:		case Intrinsic::amdgcn_ds_append:
return selectDSAppendConsume(I, true);		return selectDSAppendConsume(I, true);
case Intrinsic::amdgcn_ds_consume:		case Intrinsic::amdgcn_ds_consume:
return selectDSAppendConsume(I, false);		return selectDSAppendConsume(I, false);
case Intrinsic::amdgcn_s_barrier:		case Intrinsic::amdgcn_s_barrier:
return selectSBarrier(I);		return selectSBarrier(I);
case Intrinsic::amdgcn_global_atomic_fadd:		case Intrinsic::amdgcn_global_atomic_fadd:
return selectGlobalAtomicFaddIntrinsic(I);		return selectGlobalAtomicFadd(I, I.getOperand(2), I.getOperand(3));
default: {		default: {
return selectImpl(I, *CoverageInfo);		return selectImpl(I, *CoverageInfo);
}		}
}		}
}		}

bool AMDGPUInstructionSelector::selectG_SELECT(MachineInstr &I) const {		bool AMDGPUInstructionSelector::selectG_SELECT(MachineInstr &I) const {
if (selectImpl(I, *CoverageInfo))		if (selectImpl(I, *CoverageInfo))
▲ Show 20 Lines • Show All 586 Lines • ▼ Show 20 Lines	if ((AS == AMDGPUAS::LOCAL_ADDRESS \|\| AS == AMDGPUAS::REGION_ADDRESS) &&
// If DS instructions require M0 initializtion, insert it before selecting.		// If DS instructions require M0 initializtion, insert it before selecting.
BuildMI(*BB, &I, I.getDebugLoc(), TII.get(AMDGPU::S_MOV_B32), AMDGPU::M0)		BuildMI(*BB, &I, I.getDebugLoc(), TII.get(AMDGPU::S_MOV_B32), AMDGPU::M0)
.addImm(-1);		.addImm(-1);
}		}
}		}

bool AMDGPUInstructionSelector::selectG_LOAD_STORE_ATOMICRMW(		bool AMDGPUInstructionSelector::selectG_LOAD_STORE_ATOMICRMW(
MachineInstr &I) const {		MachineInstr &I) const {
		if (I.getOpcode() == TargetOpcode::G_ATOMICRMW_FADD) {
		const LLT PtrTy = MRI->getType(I.getOperand(1).getReg());
		unsigned AS = PtrTy.getAddressSpace();
		if (AS == AMDGPUAS::GLOBAL_ADDRESS)
		return selectGlobalAtomicFadd(I, I.getOperand(1), I.getOperand(2));
		}

initM0(I);		initM0(I);
return selectImpl(I, *CoverageInfo);		return selectImpl(I, *CoverageInfo);
}		}

// TODO: No rtn optimization.		// TODO: No rtn optimization.
bool AMDGPUInstructionSelector::selectG_AMDGPU_ATOMIC_CMPXCHG(		bool AMDGPUInstructionSelector::selectG_AMDGPU_ATOMIC_CMPXCHG(
MachineInstr &MI) const {		MachineInstr &MI) const {
Register PtrReg = MI.getOperand(1).getReg();		Register PtrReg = MI.getOperand(1).getReg();
▲ Show 20 Lines • Show All 625 Lines • ▼ Show 20 Lines	bool AMDGPUInstructionSelector::selectAMDGPU_BUFFER_ATOMIC_FADD(
I.addImm(MI.getOperand(7).getImm()); // cpol		I.addImm(MI.getOperand(7).getImm()); // cpol
I.cloneMemRefs(MI);		I.cloneMemRefs(MI);

MI.eraseFromParent();		MI.eraseFromParent();

return true;		return true;
}		}

bool AMDGPUInstructionSelector::selectGlobalAtomicFaddIntrinsic(		bool AMDGPUInstructionSelector::selectGlobalAtomicFadd(
MachineInstr &MI) const{		MachineInstr &MI, MachineOperand &AddrOp, MachineOperand &DataOp) const {

if (STI.hasGFX90AInsts())		if (STI.hasGFX90AInsts()) {
		// gfx90a adds return versions of the global atomic fadd instructions so no
		// special handling is required.
return selectImpl(MI, *CoverageInfo);		return selectImpl(MI, *CoverageInfo);
		}

MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
const DebugLoc &DL = MI.getDebugLoc();		const DebugLoc &DL = MI.getDebugLoc();

if (!MRI->use_nodbg_empty(MI.getOperand(0).getReg())) {		if (!MRI->use_nodbg_empty(MI.getOperand(0).getReg())) {
Function &F = MBB->getParent()->getFunction();		Function &F = MBB->getParent()->getFunction();
DiagnosticInfoUnsupported		DiagnosticInfoUnsupported
NoFpRet(F, "return versions of fp atomics not supported",		NoFpRet(F, "return versions of fp atomics not supported",
MI.getDebugLoc(), DS_Error);		MI.getDebugLoc(), DS_Error);
F.getContext().diagnose(NoFpRet);		F.getContext().diagnose(NoFpRet);
return false;		return false;
}		}

// FIXME: This is only needed because tablegen requires number of dst operands		// FIXME: This is only needed because tablegen requires number of dst operands
// in match and replace pattern to be the same. Otherwise patterns can be		// in match and replace pattern to be the same. Otherwise patterns can be
// exported from SDag path.		// exported from SDag path.
auto Addr = selectFlatOffsetImpl<true>(MI.getOperand(2));		auto Addr = selectFlatOffsetImpl<true>(AddrOp);

Register Data = MI.getOperand(3).getReg();		Register Data = DataOp.getReg();
const unsigned Opc = MRI->getType(Data).isVector() ?		const unsigned Opc = MRI->getType(Data).isVector() ?
AMDGPU::GLOBAL_ATOMIC_PK_ADD_F16 : AMDGPU::GLOBAL_ATOMIC_ADD_F32;		AMDGPU::GLOBAL_ATOMIC_PK_ADD_F16 : AMDGPU::GLOBAL_ATOMIC_ADD_F32;
auto MIB = BuildMI(*MBB, &MI, DL, TII.get(Opc))		auto MIB = BuildMI(*MBB, &MI, DL, TII.get(Opc))
.addReg(Addr.first)		.addReg(Addr.first)
.addReg(Data)		.addReg(Data)
.addImm(Addr.second)		.addImm(Addr.second)
.addImm(0) // cpol		.addImm(0) // cpol
.cloneMemRefs(MI);		.cloneMemRefs(MI);
▲ Show 20 Lines • Show All 1,339 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 1,285 Lines • ▼ Show 20 Lines	auto &Atomics = getActionDefinitionsBuilder(
G_ATOMICRMW_UMIN})		G_ATOMICRMW_UMIN})
.legalFor({{S32, GlobalPtr}, {S32, LocalPtr},		.legalFor({{S32, GlobalPtr}, {S32, LocalPtr},
{S64, GlobalPtr}, {S64, LocalPtr},		{S64, GlobalPtr}, {S64, LocalPtr},
{S32, RegionPtr}, {S64, RegionPtr}});		{S32, RegionPtr}, {S64, RegionPtr}});
if (ST.hasFlatAddressSpace()) {		if (ST.hasFlatAddressSpace()) {
Atomics.legalFor({{S32, FlatPtr}, {S64, FlatPtr}});		Atomics.legalFor({{S32, FlatPtr}, {S64, FlatPtr}});
}		}

		auto &Atomic = getActionDefinitionsBuilder(G_ATOMICRMW_FADD);
if (ST.hasLDSFPAtomics()) {		if (ST.hasLDSFPAtomics()) {
auto &Atomic = getActionDefinitionsBuilder(G_ATOMICRMW_FADD)		Atomic.legalFor({{S32, LocalPtr}, {S32, RegionPtr}});
.legalFor({{S32, LocalPtr}, {S32, RegionPtr}});
if (ST.hasGFX90AInsts())		if (ST.hasGFX90AInsts())
Atomic.legalFor({{S64, LocalPtr}});		Atomic.legalFor({{S64, LocalPtr}});
}		}
		if (ST.hasAtomicFaddInsts())
		Atomic.legalFor({{S32, GlobalPtr}});
		arsenmUnsubmitted Not Done Reply Inline Actions Isn't this also conditional on the denorm mode or the unsafe atomic attribute? That would need to be custom and verify those are consistent arsenm: Isn't this also conditional on the denorm mode or the unsafe atomic attribute? That would need…
		foadAuthorUnsubmitted Done Reply Inline Actions Those checks are done in SITargetLowering::shouldExpandAtomicRMWInIR for both selectiondag and globalisel. foad: Those checks are done in SITargetLowering::shouldExpandAtomicRMWInIR for both selectiondag and…

// BUFFER/FLAT_ATOMIC_CMP_SWAP on GCN GPUs needs input marshalling, and output		// BUFFER/FLAT_ATOMIC_CMP_SWAP on GCN GPUs needs input marshalling, and output
// demarshalling		// demarshalling
getActionDefinitionsBuilder(G_ATOMIC_CMPXCHG)		getActionDefinitionsBuilder(G_ATOMIC_CMPXCHG)
.customFor({{S32, GlobalPtr}, {S64, GlobalPtr},		.customFor({{S32, GlobalPtr}, {S64, GlobalPtr},
{S32, FlatPtr}, {S64, FlatPtr}})		{S32, FlatPtr}, {S64, FlatPtr}})
.legalFor({{S32, LocalPtr}, {S64, LocalPtr},		.legalFor({{S32, LocalPtr}, {S64, LocalPtr},
{S32, RegionPtr}, {S64, RegionPtr}});		{S32, RegionPtr}, {S64, RegionPtr}});
▲ Show 20 Lines • Show All 3,603 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-atomicrmw-fadd-global.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx908 -O0 -run-pass=legalizer %s -o - \| FileCheck %s
				# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx90a -O0 -run-pass=legalizer %s -o - \| FileCheck %s

				# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -O0 -run-pass=legalizer -global-isel-abort=2 -pass-remarks-missed='gisel.*' -o /dev/null %s 2>&1 \| FileCheck -check-prefix=ERR %s

				# ERR: remark: <unknown>:0:0: unable to legalize instruction: %2:_(s32) = G_ATOMICRMW_FADD %0:_(p1), %1:_ :: (load store seq_cst 4, addrspace 1) (in function: atomicrmw_fadd_global_i32)

				---
				name: atomicrmw_fadd_global_i32

				body: \|
				bb.0:
				liveins: $sgpr0_sgpr1, $sgpr2
				; CHECK-LABEL: name: atomicrmw_fadd_global_i32
				; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $sgpr0_sgpr1
				; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr2
				; CHECK: [[ATOMICRMW_FADD:%[0-9]+]]:_(s32) = G_ATOMICRMW_FADD [[COPY]](p1), [[COPY1]] :: (load store seq_cst 4, addrspace 1)
				%0:_(p1) = COPY $sgpr0_sgpr1
				%1:_(s32) = COPY $sgpr2
				%2:_(s32) = G_ATOMICRMW_FADD %0, %1 :: (load store seq_cst 4, addrspace 1)
				...

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-atomicrmw-fadd-local.mir

This file was moved from llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-atomicrmw-fadd.mir.

The contents of this file were not changed.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-atomicrmw-fadd.mir

This file was moved to llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-atomicrmw-fadd-local.mir.