Download Raw Diff

Details

Reviewers

• tstellarAMD
nhaehnle
arsenm

Commits

rG94bba4c367b2: Merging r293000:
rG2f3f9855f0fd: AMDGPU add support for spilling to a user sgpr pointed buffers
rL293240: Merging r293000:
rL293000: AMDGPU add support for spilling to a user sgpr pointed buffers

Summary

This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1].

Diff Detail

Event Timeline

airlied updated this revision to Diff 74108.Oct 9 2016, 11:53 PM

airlied retitled this revision from to AMDGPU add support for spilling to a user data SREG address..

airlied updated this object.

airlied added reviewers: arsenm, nhaehnle, • tstellarAMD.

airlied set the repository for this revision to rL LLVM.

Herald edited edge metadata. · View Herald TranscriptOct 9 2016, 11:53 PM

Herald added subscribers: tony-tye, yaxunl, wdng, kzhuravl. · View Herald Transcript

FYI: The current mesa side of this is here:
https://github.com/airlied/mesa/tree/radv-wip-spilling

I think ABI changes should be encoded as part of the triple (a radv environment type?) rather than a subtarget feature. This is supposed to be just the pointer and not the full scratch resource descriptor?

Alternatively we could just switch mesa to always doing this and remove the relocations

In D25428#565912, @arsenm wrote:

I think ABI changes should be encoded as part of the triple (a radv environment type?) rather than a subtarget feature. This is supposed to be just the pointer and not the full scratch resource descriptor?

Hmm I'll try and investigate adding a radv env triple tomorrow.

Just a pointer, I don't have space for 4 user gprs here. An alternative plan of action is to make the first descriptor on the first pointer in the user sgprs to point to the spill descriptor, I'm not sure if that causes problems though.

In D25428#565913, @arsenm wrote:

Alternatively we could just switch mesa to always doing this and remove the relocations

I'm not sure the GL driver has enough spare user sgprs to allow this to happen.

As mentioned on mesa-dev, I like the general approach, but instead of a machine feature to enable this I'd use a function attribute which designates one of the function arguments as the source for the spill pointer, e.g. "amdgpu-spill-ptr=4" to take the pointer from the fifth function argument (which would typically be SGPR 8/9).

In D25428#565918, @airlied wrote:

I'm not sure the GL driver has enough spare user sgprs to allow this to happen.

True, GL will actually need to do some indirect loading at least for compute shaders. Also, we should really set the size part of the descriptor correctly anyway, so there's not much benefit to loading a full descriptor instead of just a 64-bit pointer.

Updated diff to spill to a 64-bit buffer, pointed to by 64-bits in the buffer pointed to by SGPR0/1.

Herald edited edge metadata. · View Herald TranscriptOct 10 2016, 10:35 PM

arsenm added inline comments.Oct 13 2016, 6:17 AM

lib/Target/AMDGPU/SIFrameLowering.cpp
245–270	Formatting for this all looks wrong
247	This is confusingly named because it's not S_MOV_B64
248	You can just use the literal AMDGPU::SGPR0_SGPR1 if you're just using a known constant register. However, I would expect this to be getting the register from the MachineFunctionInfo rather than hardcoding it again inside FrameLowering
253	This shouldn't be undef, the read value does matter
256	Because this is using physical registers here, the sub register index isn't used (also sub0_sub1 of s[0:1] doesn't make sense)
256	Because you are creating a load you should add a MachineMemOperand or else it will unduly constrain the post-RA scheduler. There are a few other dummy places that create read-only memory operands for similar purposes. Also because it's a load I would hope we could do this much earlier than frame lowering. I'm thinking about how to properly initialize m0 in the HSA ABI which also requires a load so I will probably end up taking care of that eventually

airlied updated this revision to Diff 81697.Dec 15 2016, 5:53 PM

airlied retitled this revision from AMDGPU add support for spilling to a user data SREG address. to AMDGPU add support for spilling to a user sgpr pointed buffers.

airlied updated this object.

airlied edited edge metadata.

airlied removed rL LLVM as the repository for this revision.

Herald edited edge metadata. · View Herald TranscriptDec 15 2016, 5:53 PM

This is missing the part touching the argument lowering. When the input sgpr0/1 is going to be used it needs to be marked in the initial argument lowering in LowerFormalArguments, or else the other argument usage might end up thinking it can use these

lib/Target/AMDGPU/SIFrameLowering.cpp
316	I think this needs a better name, that definitely doesn't involve the word spill. We currently mix the terms scratch and private memory. How about hasPrivateMemoryPointerInput()?
326	hasIndirectPrivaetMemoryPointerInput? I assume you are putting other things in this buffer besides just this one pointer, so maybe the name should be whatever you want to call that buffer.
333–335	These go over 80 lines
338–340	Usually we put a comment with the name of the operand after each one for the more complicated instructions, e.g. // offset
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
132–135	I don't really like using the attributes this way naming specific registers. This needs to always be available, so I don't see why you need to explicitly enable this particularly in the indirect case.

mareko added a subscriber: mareko.Dec 28 2016, 3:07 PM

mareko added inline comments.

lib/Target/AMDGPU/SIFrameLowering.cpp
326	There are no other things in the scratch buffer. Only LLVM uses it. It doesn't matter what it's called from the Mesa's point of view.
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
132–135	I don't understand the comment. Of course it's always available. The driver just chooses one of the methods: 1) scratch relocation; 2) the pointer is in SGPR01; 3) the pointer is at the address pointed to by SGPR01

This rewrites the patch, using a new triple OsType (needs renaming to something better), and adds an intrinsic to get access to the private buffer.

Herald edited edge metadata. · View Herald TranscriptDec 28 2016, 7:18 PM

arsenm added inline comments.Dec 28 2016, 9:00 PM

include/llvm/IR/IntrinsicsAMDGPU.td
103–105	The name shouldn't include private because this is the pointer to the buffer with more than just the private resource descriptor.
lib/Target/AMDGPU/AMDGPUSubtarget.h
156	Did you try adding the version number to the end and checking that instead of adding a new OS?

updates after taking to Tom on irc, still might cause a flag day.

LGTM except for some pedantry

lib/Target/AMDGPU/AMDGPUSubtarget.cpp
300	Looks like it's over 80 columns
lib/Target/AMDGPU/AMDGPUSubtarget.h
318	C++ style comment, capitalized
327	Extra line
lib/Target/AMDGPU/SIRegisterInfo.cpp
1114–1117	No return after else

This revision is now accepted and ready to land.Jan 23 2017, 9:05 PM

fix pedantry
fix two bugs found in testing.

fixed a regression in the when to emit spill setup path, should have just added isMesaGfx option not removed the other one.

Closed by commit rL293000: AMDGPU add support for spilling to a user sgpr pointed buffers (authored by tstellar). · Explain WhyJan 24 2017, 5:36 PM

This revision was automatically updated to reflect the committed changes.

Diff 85525

include/llvm/IR/IntrinsicsAMDGPU.td

Context not available.
	GCCBuiltin<"__builtin_amdgcn_dispatch_id">,	GCCBuiltin<"__builtin_amdgcn_dispatch_id">,
	Intrinsic<[llvm_i64_ty], [], [IntrNoMem]>;	Intrinsic<[llvm_i64_ty], [], [IntrNoMem]>;

		def int_amdgcn_implicit_buffer_ptr :
		GCCBuiltin<"__builtin_amdgcn_implicit_buffer_ptr">,
		Intrinsic<[LLVMQualPointerType<llvm_i8_ty, 2>], [], [IntrNoMem]>;
		arsenmUnsubmitted Not Done Reply Inline Actions The name shouldn't include private because this is the pointer to the buffer with more than just the private resource descriptor. arsenm: The name shouldn't include private because this is the pointer to the buffer with more than…

	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
	// Instruction Intrinsics	// Instruction Intrinsics
	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
Context not available.

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

Context not available.
	void AMDGPUAsmPrinter::EmitFunctionBodyStart() {	void AMDGPUAsmPrinter::EmitFunctionBodyStart() {
	const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();	const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();
	SIProgramInfo KernelInfo;	SIProgramInfo KernelInfo;
	if (STM.isAmdCodeObjectV2()) {	if (STM.isAmdCodeObjectV2(*MF)) {
	getSIProgramInfo(KernelInfo, *MF);	getSIProgramInfo(KernelInfo, *MF);
	EmitAmdKernelCodeT(*MF, KernelInfo);	EmitAmdKernelCodeT(*MF, KernelInfo);
	}	}
Context not available.
	void AMDGPUAsmPrinter::EmitFunctionEntryLabel() {	void AMDGPUAsmPrinter::EmitFunctionEntryLabel() {
	const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();	const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
	const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();	const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();
	if (MFI->isKernel() && STM.isAmdCodeObjectV2()) {	if (MFI->isKernel() && STM.isAmdCodeObjectV2(*MF)) {
	AMDGPUTargetStreamer *TS =	AMDGPUTargetStreamer *TS =
	static_cast<AMDGPUTargetStreamer *>(OutStreamer->getTargetStreamer());	static_cast<AMDGPUTargetStreamer *>(OutStreamer->getTargetStreamer());
	SmallString<128> SymbolName;	SmallString<128> SymbolName;
Context not available.

	// FIXME: Should use getKernArgSize	// FIXME: Should use getKernArgSize
	header.kernarg_segment_byte_size =	header.kernarg_segment_byte_size =
	STM.getKernArgSegmentSize(MFI->getABIArgOffset());	STM.getKernArgSegmentSize(MF, MFI->getABIArgOffset());
	header.wavefront_sgpr_count = KernelInfo.NumSGPR;	header.wavefront_sgpr_count = KernelInfo.NumSGPR;
	header.workitem_vgpr_count = KernelInfo.NumVGPR;	header.workitem_vgpr_count = KernelInfo.NumVGPR;
	header.workitem_private_segment_byte_size = KernelInfo.ScratchSize;	header.workitem_private_segment_byte_size = KernelInfo.ScratchSize;
Context not available.

lib/Target/AMDGPU/AMDGPUSubtarget.h

Context not available.
	return EnableXNACK;	return EnableXNACK;
	}	}

	bool isAmdCodeObjectV2() const {	bool isMesaKernel(const MachineFunction &MF) const {
	return isAmdHsaOS() \|\| isMesa3DOS();	return isMesa3DOS() && AMDGPU::isShader(MF.getFunction()->getCallingConv());
	}	}

		/* covers vs/ps/cs gfx shaders */
		arsenmUnsubmitted Not Done Reply Inline Actions C++ style comment, capitalized arsenm: C++ style comment, capitalized
		bool isMesaGfxShader(const MachineFunction &MF) const {
		return isMesa3DOS() && AMDGPU::isShader(MF.getFunction()->getCallingConv());
		}

		bool isAmdCodeObjectV2(const MachineFunction &MF) const {
		return isAmdHsaOS() \|\| isMesaKernel(MF);
		}


		arsenmUnsubmitted Not Done Reply Inline Actions Extra line arsenm: Extra line
	/// \brief Returns the offset in bytes from the start of the input buffer	/// \brief Returns the offset in bytes from the start of the input buffer
	/// of the first explicit kernel argument.	/// of the first explicit kernel argument.
	unsigned getExplicitKernelArgOffset() const {	unsigned getExplicitKernelArgOffset(const MachineFunction &MF) const {
	return isAmdCodeObjectV2() ? 0 : 36;	return isAmdCodeObjectV2(MF) ? 0 : 36;
	}	}

	unsigned getAlignmentForImplicitArgPtr() const {	unsigned getAlignmentForImplicitArgPtr() const {
	return isAmdHsaOS() ? 8 : 4;	return isAmdHsaOS() ? 8 : 4;
	}	}

	unsigned getImplicitArgNumBytes() const {	unsigned getImplicitArgNumBytes(const MachineFunction &MF) const {
	if (isMesa3DOS())	if (isMesaKernel(MF))
	return 16;	return 16;
	if (isAmdHsaOS() && isOpenCLEnv())	if (isAmdHsaOS() && isOpenCLEnv())
	return 32;	return 32;
Context not available.
	return getGeneration() != AMDGPUSubtarget::SOUTHERN_ISLANDS;	return getGeneration() != AMDGPUSubtarget::SOUTHERN_ISLANDS;
	}	}

	unsigned getKernArgSegmentSize(unsigned ExplictArgBytes) const;	unsigned getKernArgSegmentSize(const MachineFunction &MF, unsigned ExplictArgBytes) const;

	/// Return the maximum number of waves per SIMD for kernels using \p SGPRs SGPRs	/// Return the maximum number of waves per SIMD for kernels using \p SGPRs SGPRs
	unsigned getOccupancyWithNumSGPRs(unsigned SGPRs) const;	unsigned getOccupancyWithNumSGPRs(unsigned SGPRs) const;
Context not available.

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Context not available.
	return EnableVGPRSpilling \|\| !AMDGPU::isShader(F.getCallingConv());	return EnableVGPRSpilling \|\| !AMDGPU::isShader(F.getCallingConv());
	}	}

	unsigned SISubtarget::getKernArgSegmentSize(unsigned ExplicitArgBytes) const {	unsigned SISubtarget::getKernArgSegmentSize(const MachineFunction &MF, unsigned ExplicitArgBytes) const {
		arsenmUnsubmitted Not Done Reply Inline Actions Looks like it's over 80 columns arsenm: Looks like it's over 80 columns
	unsigned ImplicitBytes = getImplicitArgNumBytes();	unsigned ImplicitBytes = getImplicitArgNumBytes(MF);
	if (ImplicitBytes == 0)	if (ImplicitBytes == 0)
	return ExplicitArgBytes;	return ExplicitArgBytes;

Context not available.

lib/Target/AMDGPU/R600ISelLowering.cpp

Context not available.

	unsigned ValBase = ArgLocs[In.getOrigArgIndex()].getLocMemOffset();	unsigned ValBase = ArgLocs[In.getOrigArgIndex()].getLocMemOffset();
	unsigned PartOffset = VA.getLocMemOffset();	unsigned PartOffset = VA.getLocMemOffset();
	unsigned Offset = Subtarget->getExplicitKernelArgOffset() + VA.getLocMemOffset();	unsigned Offset = Subtarget->getExplicitKernelArgOffset(MF) + VA.getLocMemOffset();

	MachinePointerInfo PtrInfo(UndefValue::get(PtrTy), PartOffset - ValBase);	MachinePointerInfo PtrInfo(UndefValue::get(PtrTy), PartOffset - ValBase);
	SDValue Arg = DAG.getLoad(	SDValue Arg = DAG.getLoad(
Context not available.

lib/Target/AMDGPU/SIFrameLowering.cpp

Context not available.


	unsigned PreloadedPrivateBufferReg = AMDGPU::NoRegister;	unsigned PreloadedPrivateBufferReg = AMDGPU::NoRegister;
	if (ST.isAmdCodeObjectV2()) {	if (ST.isAmdCodeObjectV2(MF) \|\| ST.isMesaGfxShader(MF)) {
	PreloadedPrivateBufferReg = TRI->getPreloadedValue(	PreloadedPrivateBufferReg = TRI->getPreloadedValue(
	MF, SIRegisterInfo::PRIVATE_SEGMENT_BUFFER);	MF, SIRegisterInfo::PRIVATE_SEGMENT_BUFFER);
	}	}
		arsenmUnsubmitted Not Done Reply Inline Actions Formatting for this all looks wrong arsenm: Formatting for this all looks wrong
		arsenmUnsubmitted Not Done Reply Inline Actions You can just use the literal AMDGPU::SGPR0_SGPR1 if you're just using a known constant register. However, I would expect this to be getting the register from the MachineFunctionInfo rather than hardcoding it again inside FrameLowering arsenm: You can just use the literal AMDGPU::SGPR0_SGPR1 if you're just using a known constant register.
		arsenmUnsubmitted Not Done Reply Inline Actions This is confusingly named because it's not S_MOV_B64 arsenm: This is confusingly named because it's not S_MOV_B64
		arsenmUnsubmitted Not Done Reply Inline Actions This shouldn't be undef, the read value does matter arsenm: This shouldn't be undef, the read value does matter
Context not available.
	}	}

		arsenmUnsubmitted Not Done Reply Inline Actions Because this is using physical registers here, the sub register index isn't used (also sub0_sub1 of s[0:1] doesn't make sense) arsenm: Because this is using physical registers here, the sub register index isn't used (also…
		arsenmUnsubmitted Not Done Reply Inline Actions Because you are creating a load you should add a MachineMemOperand or else it will unduly constrain the post-RA scheduler. There are a few other dummy places that create read-only memory operands for similar purposes. Also because it's a load I would hope we could do this much earlier than frame lowering. I'm thinking about how to properly initialize m0 in the HSA ABI which also requires a load so I will probably end up taking care of that eventually arsenm: Because you are creating a load you should add a MachineMemOperand or else it will unduly…
	if (ResourceRegUsed && PreloadedPrivateBufferReg != AMDGPU::NoRegister) {	if (ResourceRegUsed && PreloadedPrivateBufferReg != AMDGPU::NoRegister) {
	assert(ST.isAmdCodeObjectV2());	assert(ST.isAmdCodeObjectV2(MF) \|\| ST.isMesaGfxShader(MF));
	MRI.addLiveIn(PreloadedPrivateBufferReg);	MRI.addLiveIn(PreloadedPrivateBufferReg);
	MBB.addLiveIn(PreloadedPrivateBufferReg);	MBB.addLiveIn(PreloadedPrivateBufferReg);
	}	}
Context not available.

	bool CopyBuffer = ResourceRegUsed &&	bool CopyBuffer = ResourceRegUsed &&
	PreloadedPrivateBufferReg != AMDGPU::NoRegister &&	PreloadedPrivateBufferReg != AMDGPU::NoRegister &&
		ST.isAmdCodeObjectV2(MF) &&
	ScratchRsrcReg != PreloadedPrivateBufferReg;	ScratchRsrcReg != PreloadedPrivateBufferReg;

	// This needs to be careful of the copying order to avoid overwriting one of	// This needs to be careful of the copying order to avoid overwriting one of
Context not available.
	.addReg(PreloadedPrivateBufferReg, RegState::Kill);	.addReg(PreloadedPrivateBufferReg, RegState::Kill);
	}	}

	if (ResourceRegUsed && PreloadedPrivateBufferReg == AMDGPU::NoRegister) {	if (ResourceRegUsed && ST.isMesaGfxShader(MF)) {
	assert(!ST.isAmdCodeObjectV2());	assert(!ST.isAmdCodeObjectV2(MF));
	const MCInstrDesc &SMovB32 = TII->get(AMDGPU::S_MOV_B32);	const MCInstrDesc &SMovB32 = TII->get(AMDGPU::S_MOV_B32);

	unsigned Rsrc0 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub0);
	unsigned Rsrc1 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub1);
	unsigned Rsrc2 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub2);	unsigned Rsrc2 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub2);
	unsigned Rsrc3 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub3);	unsigned Rsrc3 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub3);

	// Use relocations to get the pointer, and setup the other bits manually.	// Use relocations to get the pointer, and setup the other bits manually.
	uint64_t Rsrc23 = TII->getScratchRsrcWords23();	uint64_t Rsrc23 = TII->getScratchRsrcWords23();
	BuildMI(MBB, I, DL, SMovB32, Rsrc0)
	.addExternalSymbol("SCRATCH_RSRC_DWORD0")
	.addReg(ScratchRsrcReg, RegState::ImplicitDefine);

		arsenmUnsubmitted Not Done Reply Inline Actions I think this needs a better name, that definitely doesn't involve the word spill. We currently mix the terms scratch and private memory. How about hasPrivateMemoryPointerInput()? arsenm: I think this needs a better name, that definitely doesn't involve the word spill. We currently…
	BuildMI(MBB, I, DL, SMovB32, Rsrc1)	if (MFI->hasPrivateMemoryInputPtr()) {
	.addExternalSymbol("SCRATCH_RSRC_DWORD1")	unsigned Rsrc01 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub0_sub1);
	.addReg(ScratchRsrcReg, RegState::ImplicitDefine);
		if (AMDGPU::isCompute(MF.getFunction()->getCallingConv())) {
		const MCInstrDesc &Mov64 = TII->get(AMDGPU::S_MOV_B64);

		BuildMI(MBB, I, DL, Mov64, Rsrc01)
		.addReg(PreloadedPrivateBufferReg)
		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);
		} else {
		arsenmUnsubmitted Not Done Reply Inline Actions hasIndirectPrivaetMemoryPointerInput? I assume you are putting other things in this buffer besides just this one pointer, so maybe the name should be whatever you want to call that buffer. arsenm: hasIndirectPrivaetMemoryPointerInput? I assume you are putting other things in this buffer…
		marekoUnsubmitted Not Done Reply Inline Actions There are no other things in the scratch buffer. Only LLVM uses it. It doesn't matter what it's called from the Mesa's point of view. mareko: There are no other things in the scratch buffer. Only LLVM uses it. It doesn't matter what it's…
		const MCInstrDesc &LoadDwordX2 = TII->get(AMDGPU::S_LOAD_DWORDX2_IMM);

		PointerType *PtrTy =
		PointerType::get(Type::getInt64Ty(MF.getFunction()->getContext()),
		AMDGPUAS::CONSTANT_ADDRESS);
		MachinePointerInfo PtrInfo(UndefValue::get(PtrTy));
		auto MMO = MF.getMachineMemOperand(PtrInfo,
		MachineMemOperand::MOLoad \|
		MachineMemOperand::MOInvariant \|
		arsenmUnsubmitted Not Done Reply Inline Actions These go over 80 lines arsenm: These go over 80 lines
		MachineMemOperand::MODereferenceable,
		0, 0);
		BuildMI(MBB, I, DL, LoadDwordX2, Rsrc01)
		.addReg(PreloadedPrivateBufferReg)
		.addImm(0) // offset
		arsenmUnsubmitted Not Done Reply Inline Actions Usually we put a comment with the name of the operand after each one for the more complicated instructions, e.g. // offset arsenm: Usually we put a comment with the name of the operand after each one for the more complicated…
		.addImm(0) // glc
		.addMemOperand(MMO)
		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);
		}
		} else {
		unsigned Rsrc0 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub0);
		unsigned Rsrc1 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub1);

		BuildMI(MBB, I, DL, SMovB32, Rsrc0)
		.addExternalSymbol("SCRATCH_RSRC_DWORD0")
		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);

		BuildMI(MBB, I, DL, SMovB32, Rsrc1)
		.addExternalSymbol("SCRATCH_RSRC_DWORD1")
		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);

		}

	BuildMI(MBB, I, DL, SMovB32, Rsrc2)	BuildMI(MBB, I, DL, SMovB32, Rsrc2)
	.addImm(Rsrc23 & 0xffffffff)	.addImm(Rsrc23 & 0xffffffff)
Context not available.

lib/Target/AMDGPU/SIISelLowering.cpp

Context not available.
	!Info->hasWorkItemIDZ());	!Info->hasWorkItemIDZ());
	}	}

		if (Info->hasPrivateMemoryInputPtr()) {
		unsigned PrivateMemoryPtrReg = Info->addPrivateMemoryPtr(*TRI);
		MF.addLiveIn(PrivateMemoryPtrReg, &AMDGPU::SReg_64RegClass);
		CCInfo.AllocateReg(PrivateMemoryPtrReg);
		}

	// FIXME: How should these inputs interact with inreg / custom SGPR inputs?	// FIXME: How should these inputs interact with inreg / custom SGPR inputs?
	if (Info->hasPrivateSegmentBuffer()) {	if (Info->hasPrivateSegmentBuffer()) {
	unsigned PrivateSegmentBufferReg = Info->addPrivateSegmentBuffer(*TRI);	unsigned PrivateSegmentBufferReg = Info->addPrivateSegmentBuffer(*TRI);
Context not available.
	if (VA.isMemLoc()) {	if (VA.isMemLoc()) {
	VT = Ins[i].VT;	VT = Ins[i].VT;
	EVT MemVT = VA.getLocVT();	EVT MemVT = VA.getLocVT();
	const unsigned Offset = Subtarget->getExplicitKernelArgOffset() +	const unsigned Offset = Subtarget->getExplicitKernelArgOffset(MF) +
	VA.getLocMemOffset();	VA.getLocMemOffset();
	// The first 36 bytes of the input buffer contains information about	// The first 36 bytes of the input buffer contains information about
	// thread group and global sizes.	// thread group and global sizes.
Context not available.
	if (getTargetMachine().getOptLevel() == CodeGenOpt::None)	if (getTargetMachine().getOptLevel() == CodeGenOpt::None)
	HasStackObjects = true;	HasStackObjects = true;

	if (ST.isAmdCodeObjectV2()) {	if (ST.isAmdCodeObjectV2(MF)) {
	if (HasStackObjects) {	if (HasStackObjects) {
	// If we have stack objects, we unquestionably need the private buffer	// If we have stack objects, we unquestionably need the private buffer
	// resource. For the Code Object V2 ABI, this will be the first 4 user	// resource. For the Code Object V2 ABI, this will be the first 4 user
Context not available.
	// TODO: Should this propagate fast-math-flags?	// TODO: Should this propagate fast-math-flags?

	switch (IntrinsicID) {	switch (IntrinsicID) {
		case Intrinsic::amdgcn_implicit_buffer_ptr: {
		unsigned Reg = TRI->getPreloadedValue(MF, SIRegisterInfo::PRIVATE_SEGMENT_BUFFER);
		return CreateLiveInRegister(DAG, &AMDGPU::SReg_64RegClass, Reg, VT);
		}
	case Intrinsic::amdgcn_dispatch_ptr:	case Intrinsic::amdgcn_dispatch_ptr:
	case Intrinsic::amdgcn_queue_ptr: {	case Intrinsic::amdgcn_queue_ptr: {
	if (!Subtarget->isAmdCodeObjectV2()) {	if (!Subtarget->isAmdCodeObjectV2(MF)) {
	DiagnosticInfoUnsupported BadIntrin(	DiagnosticInfoUnsupported BadIntrin(
	*MF.getFunction(), "unsupported hsa intrinsic without hsa target",	*MF.getFunction(), "unsupported hsa intrinsic without hsa target",
	DL.getDebugLoc());	DL.getDebugLoc());
Context not available.

lib/Target/AMDGPU/SIMachineFunctionInfo.h

Context not available.
	unsigned ScratchRSrcReg;	unsigned ScratchRSrcReg;
	unsigned ScratchWaveOffsetReg;	unsigned ScratchWaveOffsetReg;

		// Input registers for non-HSA ABI
		unsigned PrivateMemoryPtrUserSGPR;

	// Input registers setup for the HSA ABI.	// Input registers setup for the HSA ABI.
	// User SGPRs in allocation order.	// User SGPRs in allocation order.
	unsigned PrivateSegmentBufferUserSGPR;	unsigned PrivateSegmentBufferUserSGPR;
Context not available.
	bool WorkItemIDY : 1;	bool WorkItemIDY : 1;
	bool WorkItemIDZ : 1;	bool WorkItemIDZ : 1;

		// Private memory buffer
		// Compute directly in sgpr[0:1]
		// Other shaders indirect 64-bits at sgpr[0:1]
		bool PrivateMemoryInputPtr : 1;

	MCPhysReg getNextUserSGPR() const {	MCPhysReg getNextUserSGPR() const {
	assert(NumSystemSGPRs == 0 && "System SGPRs must be added after user SGPRs");	assert(NumSystemSGPRs == 0 && "System SGPRs must be added after user SGPRs");
	return AMDGPU::SGPR0 + NumUserSGPRs;	return AMDGPU::SGPR0 + NumUserSGPRs;
Context not available.
	unsigned addKernargSegmentPtr(const SIRegisterInfo &TRI);	unsigned addKernargSegmentPtr(const SIRegisterInfo &TRI);
	unsigned addDispatchID(const SIRegisterInfo &TRI);	unsigned addDispatchID(const SIRegisterInfo &TRI);
	unsigned addFlatScratchInit(const SIRegisterInfo &TRI);	unsigned addFlatScratchInit(const SIRegisterInfo &TRI);
		unsigned addPrivateMemoryPtr(const SIRegisterInfo &TRI);

	// Add system SGPRs.	// Add system SGPRs.
	unsigned addWorkGroupIDX() {	unsigned addWorkGroupIDX() {
Context not available.
	return WorkItemIDZ;	return WorkItemIDZ;
	}	}

		bool hasPrivateMemoryInputPtr() const {
		return PrivateMemoryInputPtr;
		}

	unsigned getNumUserSGPRs() const {	unsigned getNumUserSGPRs() const {
	return NumUserSGPRs;	return NumUserSGPRs;
	}	}
Context not available.
	return QueuePtrUserSGPR;	return QueuePtrUserSGPR;
	}	}

		unsigned getPrivateMemoryPtrUserSGPR() const {
		return PrivateMemoryPtrUserSGPR;
		}

	bool hasSpilledSGPRs() const {	bool hasSpilledSGPRs() const {
	return HasSpilledSGPRs;	return HasSpilledSGPRs;
	}	}
Context not available.

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Context not available.
	PrivateSegmentWaveByteOffset(false),	PrivateSegmentWaveByteOffset(false),
	WorkItemIDX(false),	WorkItemIDX(false),
	WorkItemIDY(false),	WorkItemIDY(false),
	WorkItemIDZ(false) {	WorkItemIDZ(false),
		PrivateMemoryInputPtr(false) {
	const SISubtarget &ST = MF.getSubtarget<SISubtarget>();	const SISubtarget &ST = MF.getSubtarget<SISubtarget>();
	const Function *F = MF.getFunction();	const Function *F = MF.getFunction();

Context not available.
	if (HasStackObjects \|\| MaySpill)	if (HasStackObjects \|\| MaySpill)
	PrivateSegmentWaveByteOffset = true;	PrivateSegmentWaveByteOffset = true;

	if (ST.isAmdCodeObjectV2()) {	if (ST.isAmdCodeObjectV2(MF)) {
	if (HasStackObjects \|\| MaySpill)	if (HasStackObjects \|\| MaySpill)
	PrivateSegmentBuffer = true;	PrivateSegmentBuffer = true;

Context not available.

	if (F->hasFnAttribute("amdgpu-dispatch-id"))	if (F->hasFnAttribute("amdgpu-dispatch-id"))
	DispatchID = true;	DispatchID = true;
		} else if (ST.isMesaGfxShader(MF)) {
		if (HasStackObjects \|\| MaySpill)
		PrivateMemoryInputPtr = true;
	}	}

	// We don't need to worry about accessing spills with flat instructions.	// We don't need to worry about accessing spills with flat instructions.
		arsenmUnsubmitted Not Done Reply Inline Actions I don't really like using the attributes this way naming specific registers. This needs to always be available, so I don't see why you need to explicitly enable this particularly in the indirect case. arsenm: I don't really like using the attributes this way naming specific registers. This needs to…
		marekoUnsubmitted Not Done Reply Inline Actions I don't understand the comment. Of course it's always available. The driver just chooses one of the methods: 1) scratch relocation; 2) the pointer is in SGPR01; 3) the pointer is at the address pointed to by SGPR01 mareko: I don't understand the comment. Of course it's always available. The driver just chooses one of…
Context not available.
	return FlatScratchInitUserSGPR;	return FlatScratchInitUserSGPR;
	}	}

		unsigned SIMachineFunctionInfo::addPrivateMemoryPtr(const SIRegisterInfo &TRI) {
		PrivateMemoryPtrUserSGPR = TRI.getMatchingSuperReg(
		getNextUserSGPR(), AMDGPU::sub0, &AMDGPU::SReg_64RegClass);
		NumUserSGPRs += 2;
		return PrivateMemoryPtrUserSGPR;
		}

	SIMachineFunctionInfo::SpilledReg SIMachineFunctionInfo::getSpilledReg (	SIMachineFunctionInfo::SpilledReg SIMachineFunctionInfo::getSpilledReg (
	MachineFunction *MF,	MachineFunction *MF,
	unsigned FrameIndex,	unsigned FrameIndex,
Context not available.

lib/Target/AMDGPU/SIRegisterInfo.cpp

Context not available.
	case SIRegisterInfo::PRIVATE_SEGMENT_WAVE_BYTE_OFFSET:	case SIRegisterInfo::PRIVATE_SEGMENT_WAVE_BYTE_OFFSET:
	return MFI->PrivateSegmentWaveByteOffsetSystemSGPR;	return MFI->PrivateSegmentWaveByteOffsetSystemSGPR;
	case SIRegisterInfo::PRIVATE_SEGMENT_BUFFER:	case SIRegisterInfo::PRIVATE_SEGMENT_BUFFER:
	assert(ST.isAmdCodeObjectV2() &&	if (ST.isAmdCodeObjectV2(MF)) {
	"Non-CodeObjectV2 ABI currently uses relocations");	assert(MFI->hasPrivateSegmentBuffer());
	assert(MFI->hasPrivateSegmentBuffer());	return MFI->PrivateSegmentBufferUserSGPR;
	return MFI->PrivateSegmentBufferUserSGPR;	} else {
		assert(MFI->hasPrivateMemoryInputPtr());
		return MFI->PrivateMemoryPtrUserSGPR;
		}
		arsenmUnsubmitted Not Done Reply Inline Actions No return after else arsenm: No return after else
	case SIRegisterInfo::KERNARG_SEGMENT_PTR:	case SIRegisterInfo::KERNARG_SEGMENT_PTR:
	assert(MFI->hasKernargSegmentPtr());	assert(MFI->hasKernargSegmentPtr());
	return MFI->KernargSegmentPtrUserSGPR;	return MFI->KernargSegmentPtrUserSGPR;
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU add support for spilling to a user sgpr pointed buffers
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 85525

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

lib/Target/AMDGPU/AMDGPUSubtarget.h

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

lib/Target/AMDGPU/R600ISelLowering.cpp

lib/Target/AMDGPU/SIFrameLowering.cpp

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIMachineFunctionInfo.h

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

lib/Target/AMDGPU/SIRegisterInfo.cpp

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU add support for spilling to a user sgpr pointed buffersClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 85525

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

lib/Target/AMDGPU/AMDGPUSubtarget.h

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

lib/Target/AMDGPU/R600ISelLowering.cpp

lib/Target/AMDGPU/SIFrameLowering.cpp

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIMachineFunctionInfo.h

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

lib/Target/AMDGPU/SIRegisterInfo.cpp

AMDGPU add support for spilling to a user sgpr pointed buffers
ClosedPublic