Download Raw Diff

Details

Reviewers

• tstellarAMD
nhaehnle
arsenm

Commits

rG94bba4c367b2: Merging r293000:
rG2f3f9855f0fd: AMDGPU add support for spilling to a user sgpr pointed buffers
rL293240: Merging r293000:
rL293000: AMDGPU add support for spilling to a user sgpr pointed buffers

Summary

This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1].

Diff Detail

Repository: rL LLVM

Event Timeline

airlied updated this revision to Diff 74108.Oct 9 2016, 11:53 PM

airlied retitled this revision from to AMDGPU add support for spilling to a user data SREG address..

airlied updated this object.

airlied added reviewers: arsenm, nhaehnle, • tstellarAMD.

airlied set the repository for this revision to rL LLVM.

Herald edited edge metadata. · View Herald TranscriptOct 9 2016, 11:53 PM

Herald added subscribers: tony-tye, yaxunl, wdng, kzhuravl. · View Herald Transcript

FYI: The current mesa side of this is here:
https://github.com/airlied/mesa/tree/radv-wip-spilling

I think ABI changes should be encoded as part of the triple (a radv environment type?) rather than a subtarget feature. This is supposed to be just the pointer and not the full scratch resource descriptor?

Alternatively we could just switch mesa to always doing this and remove the relocations

In D25428#565912, @arsenm wrote:

I think ABI changes should be encoded as part of the triple (a radv environment type?) rather than a subtarget feature. This is supposed to be just the pointer and not the full scratch resource descriptor?

Hmm I'll try and investigate adding a radv env triple tomorrow.

Just a pointer, I don't have space for 4 user gprs here. An alternative plan of action is to make the first descriptor on the first pointer in the user sgprs to point to the spill descriptor, I'm not sure if that causes problems though.

In D25428#565913, @arsenm wrote:

Alternatively we could just switch mesa to always doing this and remove the relocations

I'm not sure the GL driver has enough spare user sgprs to allow this to happen.

As mentioned on mesa-dev, I like the general approach, but instead of a machine feature to enable this I'd use a function attribute which designates one of the function arguments as the source for the spill pointer, e.g. "amdgpu-spill-ptr=4" to take the pointer from the fifth function argument (which would typically be SGPR 8/9).

In D25428#565918, @airlied wrote:

I'm not sure the GL driver has enough spare user sgprs to allow this to happen.

True, GL will actually need to do some indirect loading at least for compute shaders. Also, we should really set the size part of the descriptor correctly anyway, so there's not much benefit to loading a full descriptor instead of just a 64-bit pointer.

Updated diff to spill to a 64-bit buffer, pointed to by 64-bits in the buffer pointed to by SGPR0/1.

Herald edited edge metadata. · View Herald TranscriptOct 10 2016, 10:35 PM

arsenm added inline comments.Oct 13 2016, 6:17 AM

lib/Target/AMDGPU/SIFrameLowering.cpp
245–270	Formatting for this all looks wrong
247	This is confusingly named because it's not S_MOV_B64
248	You can just use the literal AMDGPU::SGPR0_SGPR1 if you're just using a known constant register. However, I would expect this to be getting the register from the MachineFunctionInfo rather than hardcoding it again inside FrameLowering
253	This shouldn't be undef, the read value does matter
256	Because this is using physical registers here, the sub register index isn't used (also sub0_sub1 of s[0:1] doesn't make sense)
256	Because you are creating a load you should add a MachineMemOperand or else it will unduly constrain the post-RA scheduler. There are a few other dummy places that create read-only memory operands for similar purposes. Also because it's a load I would hope we could do this much earlier than frame lowering. I'm thinking about how to properly initialize m0 in the HSA ABI which also requires a load so I will probably end up taking care of that eventually

airlied updated this revision to Diff 81697.Dec 15 2016, 5:53 PM

airlied retitled this revision from AMDGPU add support for spilling to a user data SREG address. to AMDGPU add support for spilling to a user sgpr pointed buffers.

airlied updated this object.

airlied edited edge metadata.

airlied removed rL LLVM as the repository for this revision.

Herald edited edge metadata. · View Herald TranscriptDec 15 2016, 5:53 PM

This is missing the part touching the argument lowering. When the input sgpr0/1 is going to be used it needs to be marked in the initial argument lowering in LowerFormalArguments, or else the other argument usage might end up thinking it can use these

lib/Target/AMDGPU/SIFrameLowering.cpp
316	I think this needs a better name, that definitely doesn't involve the word spill. We currently mix the terms scratch and private memory. How about hasPrivateMemoryPointerInput()?
326	hasIndirectPrivaetMemoryPointerInput? I assume you are putting other things in this buffer besides just this one pointer, so maybe the name should be whatever you want to call that buffer.
333–335	These go over 80 lines
338–340	Usually we put a comment with the name of the operand after each one for the more complicated instructions, e.g. // offset
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
132–135	I don't really like using the attributes this way naming specific registers. This needs to always be available, so I don't see why you need to explicitly enable this particularly in the indirect case.

mareko added a subscriber: mareko.Dec 28 2016, 3:07 PM

mareko added inline comments.

lib/Target/AMDGPU/SIFrameLowering.cpp
326	There are no other things in the scratch buffer. Only LLVM uses it. It doesn't matter what it's called from the Mesa's point of view.
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
132–135	I don't understand the comment. Of course it's always available. The driver just chooses one of the methods: 1) scratch relocation; 2) the pointer is in SGPR01; 3) the pointer is at the address pointed to by SGPR01

This rewrites the patch, using a new triple OsType (needs renaming to something better), and adds an intrinsic to get access to the private buffer.

Herald edited edge metadata. · View Herald TranscriptDec 28 2016, 7:18 PM

arsenm added inline comments.Dec 28 2016, 9:00 PM

include/llvm/IR/IntrinsicsAMDGPU.td
103–105 ↗	(On Diff #82643)	The name shouldn't include private because this is the pointer to the buffer with more than just the private resource descriptor.
lib/Target/AMDGPU/AMDGPUSubtarget.h
156	Did you try adding the version number to the end and checking that instead of adding a new OS?

updates after taking to Tom on irc, still might cause a flag day.

LGTM except for some pedantry

lib/Target/AMDGPU/AMDGPUSubtarget.cpp
300	Looks like it's over 80 columns
lib/Target/AMDGPU/AMDGPUSubtarget.h
318	C++ style comment, capitalized
327	Extra line
lib/Target/AMDGPU/SIRegisterInfo.cpp
1114–1117 ↗	(On Diff #85525)	No return after else

This revision is now accepted and ready to land.Jan 23 2017, 9:05 PM

fix pedantry
fix two bugs found in testing.

fixed a regression in the when to emit spill setup path, should have just added isMesaGfx option not removed the other one.

Closed by commit rL293000: AMDGPU add support for spilling to a user sgpr pointed buffers (authored by tstellar). · Explain WhyJan 24 2017, 5:36 PM

This revision was automatically updated to reflect the committed changes.

Diff 74195

lib/Target/AMDGPU/AMDGPU.td

Context not available.
	"Dummy feature to disable assembler instructions"	"Dummy feature to disable assembler instructions"
	>;	>;

		def FeatureSpillUserPtr : SubtargetFeature<"spill-userptr",
		"EnableSpillUserPtr",
		"true",
		"Enable spilling of VGPRs to scratch memory address passed in userdata 0 and 1"
		>;

	class SubtargetFeatureGeneration <string Value,	class SubtargetFeatureGeneration <string Value,
	list<SubtargetFeature> Implies> :	list<SubtargetFeature> Implies> :
	SubtargetFeature <Value, "Gen", "AMDGPUSubtarget::"#Value,	SubtargetFeature <Value, "Gen", "AMDGPUSubtarget::"#Value,
Context not available.

lib/Target/AMDGPU/AMDGPUSubtarget.h

Context not available.
	bool EnableLoadStoreOpt;	bool EnableLoadStoreOpt;
	bool EnableUnsafeDSOffsetFolding;	bool EnableUnsafeDSOffsetFolding;
	bool EnableSIScheduler;	bool EnableSIScheduler;
		bool EnableSpillUserPtr;
	bool DumpCode;	bool DumpCode;

	// Subtarget statically properties set by tablegen	// Subtarget statically properties set by tablegen
		arsenmUnsubmitted Not Done Reply Inline Actions Did you try adding the version number to the end and checking that instead of adding a new OS? arsenm: Did you try adding the version number to the end and checking that instead of adding a new OS?
Context not available.
	return FlatForGlobal;	return FlatForGlobal;
	}	}

		bool hasSpillUserPtr() const {
		return EnableSpillUserPtr;
		}

	bool hasUnalignedBufferAccess() const {	bool hasUnalignedBufferAccess() const {
	return UnalignedBufferAccess;	return UnalignedBufferAccess;
	}	}
Context not available.
		arsenmUnsubmitted Not Done Reply Inline Actions Extra line arsenm: Extra line
		arsenmUnsubmitted Not Done Reply Inline Actions C++ style comment, capitalized arsenm: C++ style comment, capitalized

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Context not available.
	EnableLoadStoreOpt(false),	EnableLoadStoreOpt(false),
	EnableUnsafeDSOffsetFolding(false),	EnableUnsafeDSOffsetFolding(false),
	EnableSIScheduler(false),	EnableSIScheduler(false),
		EnableSpillUserPtr(false),
	DumpCode(false),	DumpCode(false),

	FP64(false),	FP64(false),
Context not available.
		arsenmUnsubmitted Not Done Reply Inline Actions Looks like it's over 80 columns arsenm: Looks like it's over 80 columns

lib/Target/AMDGPU/SIFrameLowering.cpp

Context not available.
	BuildMI(MBB, I, DL, SMovB64, Rsrc23)	BuildMI(MBB, I, DL, SMovB64, Rsrc23)
	.addReg(Hi, RegState::Kill);	.addReg(Hi, RegState::Kill);
	} else {	} else {
	unsigned Rsrc0 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub0);
	unsigned Rsrc1 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub1);
	unsigned Rsrc2 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub2);	unsigned Rsrc2 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub2);
	unsigned Rsrc3 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub3);	unsigned Rsrc3 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub3);

	// Use relocations to get the pointer, and setup the other bits manually.
	uint64_t Rsrc23 = TII->getScratchRsrcWords23();	uint64_t Rsrc23 = TII->getScratchRsrcWords23();
	BuildMI(MBB, I, DL, SMovB32, Rsrc0)
	.addExternalSymbol("SCRATCH_RSRC_DWORD0")
	.addReg(ScratchRsrcReg, RegState::ImplicitDefine);

	BuildMI(MBB, I, DL, SMovB32, Rsrc1)	if (ST.hasSpillUserPtr()) {
	.addExternalSymbol("SCRATCH_RSRC_DWORD1")	unsigned Rsrc01 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub0_sub1);
	.addReg(ScratchRsrcReg, RegState::ImplicitDefine);	const MCInstrDesc &MoveDwordX2 = TII->get(AMDGPU::S_LOAD_DWORDX2_IMM);
		arsenmUnsubmitted Not Done Reply Inline Actions This is confusingly named because it's not S_MOV_B64 arsenm: This is confusingly named because it's not S_MOV_B64
		unsigned sgpr01 = TRI->getMatchingSuperReg(AMDGPU::SGPR0, AMDGPU::sub0, &AMDGPU::SReg_64RegClass);
		arsenmUnsubmitted Not Done Reply Inline Actions You can just use the literal AMDGPU::SGPR0_SGPR1 if you're just using a known constant register. However, I would expect this to be getting the register from the MachineFunctionInfo rather than hardcoding it again inside FrameLowering arsenm: You can just use the literal AMDGPU::SGPR0_SGPR1 if you're just using a known constant register.
		MRI.addLiveIn(AMDGPU::SGPR0);
		MBB.addLiveIn(AMDGPU::SGPR0);

		BuildMI(MBB, I, DL, MoveDwordX2, Rsrc01)
		.addReg(sgpr01, RegState::Undef)
		arsenmUnsubmitted Not Done Reply Inline Actions This shouldn't be undef, the read value does matter arsenm: This shouldn't be undef, the read value does matter
		.addImm(0);
		BuildMI(MBB, I, DL, MoveDwordX2, Rsrc01)
		.addReg(Rsrc01, RegState::Undef, AMDGPU::sub0_sub1)
		arsenmUnsubmitted Not Done Reply Inline Actions Because this is using physical registers here, the sub register index isn't used (also sub0_sub1 of s[0:1] doesn't make sense) arsenm: Because this is using physical registers here, the sub register index isn't used (also…
		arsenmUnsubmitted Not Done Reply Inline Actions Because you are creating a load you should add a MachineMemOperand or else it will unduly constrain the post-RA scheduler. There are a few other dummy places that create read-only memory operands for similar purposes. Also because it's a load I would hope we could do this much earlier than frame lowering. I'm thinking about how to properly initialize m0 in the HSA ABI which also requires a load so I will probably end up taking care of that eventually arsenm: Because you are creating a load you should add a MachineMemOperand or else it will unduly…
		.addImm(0);
		} else {
		unsigned Rsrc0 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub0);
		unsigned Rsrc1 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub1);

		// Use relocations to get the pointer, and setup the other bits manually.
		BuildMI(MBB, I, DL, SMovB32, Rsrc0)
		.addExternalSymbol("SCRATCH_RSRC_DWORD0")
		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);

		BuildMI(MBB, I, DL, SMovB32, Rsrc1)
		.addExternalSymbol("SCRATCH_RSRC_DWORD1")
		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);
		}
		arsenmUnsubmitted Not Done Reply Inline Actions Formatting for this all looks wrong arsenm: Formatting for this all looks wrong

	BuildMI(MBB, I, DL, SMovB32, Rsrc2)	BuildMI(MBB, I, DL, SMovB32, Rsrc2)
	.addImm(Rsrc23 & 0xffffffff)	.addImm(Rsrc23 & 0xffffffff)
Context not available.
		arsenmUnsubmitted Not Done Reply Inline Actions These go over 80 lines arsenm: These go over 80 lines
		arsenmUnsubmitted Not Done Reply Inline Actions I think this needs a better name, that definitely doesn't involve the word spill. We currently mix the terms scratch and private memory. How about hasPrivateMemoryPointerInput()? arsenm: I think this needs a better name, that definitely doesn't involve the word spill. We currently…
		arsenmUnsubmitted Not Done Reply Inline Actions hasIndirectPrivaetMemoryPointerInput? I assume you are putting other things in this buffer besides just this one pointer, so maybe the name should be whatever you want to call that buffer. arsenm: hasIndirectPrivaetMemoryPointerInput? I assume you are putting other things in this buffer…
		marekoUnsubmitted Not Done Reply Inline Actions There are no other things in the scratch buffer. Only LLVM uses it. It doesn't matter what it's called from the Mesa's point of view. mareko: There are no other things in the scratch buffer. Only LLVM uses it. It doesn't matter what it's…
		arsenmUnsubmitted Not Done Reply Inline Actions Usually we put a comment with the name of the operand after each one for the more complicated instructions, e.g. // offset arsenm: Usually we put a comment with the name of the operand after each one for the more complicated…

lib/Target/AMDGPU/SIMachineFunctionInfo.h

Context not available.
	bool WorkItemIDY : 1;	bool WorkItemIDY : 1;
	bool WorkItemIDZ : 1;	bool WorkItemIDZ : 1;

		bool SpillUserPtr : 1; // Spill to userdata 0/1

	MCPhysReg getNextUserSGPR() const {	MCPhysReg getNextUserSGPR() const {
	assert(NumSystemSGPRs == 0 && "System SGPRs must be added after user SGPRs");	assert(NumSystemSGPRs == 0 && "System SGPRs must be added after user SGPRs");
	return AMDGPU::SGPR0 + NumUserSGPRs;	return AMDGPU::SGPR0 + NumUserSGPRs;
Context not available.
	return WorkItemIDZ;	return WorkItemIDZ;
	}	}

		bool hasSpillUserPtr() const {
		return SpillUserPtr;
		}

	unsigned getNumUserSGPRs() const {	unsigned getNumUserSGPRs() const {
	return NumUserSGPRs;	return NumUserSGPRs;
	}	}
Context not available.

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Context not available.
	PrivateSegmentWaveByteOffset(false),	PrivateSegmentWaveByteOffset(false),
	WorkItemIDX(false),	WorkItemIDX(false),
	WorkItemIDY(false),	WorkItemIDY(false),
	WorkItemIDZ(false) {	WorkItemIDZ(false),
		SpillUserPtr(false) {
	const SISubtarget &ST = MF.getSubtarget<SISubtarget>();	const SISubtarget &ST = MF.getSubtarget<SISubtarget>();
	const Function *F = MF.getFunction();	const Function *F = MF.getFunction();

Context not available.

	if (F->hasFnAttribute("amdgpu-queue-ptr"))	if (F->hasFnAttribute("amdgpu-queue-ptr"))
	QueuePtr = true;	QueuePtr = true;
	}	} else if (ST.hasSpillUserPtr())
		SpillUserPtr = true;

	// We don't need to worry about accessing spills with flat instructions.	// We don't need to worry about accessing spills with flat instructions.
	// TODO: On VI where we must use flat for global, we should be able to omit	// TODO: On VI where we must use flat for global, we should be able to omit
Context not available.
		arsenmUnsubmitted Not Done Reply Inline Actions I don't really like using the attributes this way naming specific registers. This needs to always be available, so I don't see why you need to explicitly enable this particularly in the indirect case. arsenm: I don't really like using the attributes this way naming specific registers. This needs to…
		marekoUnsubmitted Not Done Reply Inline Actions I don't understand the comment. Of course it's always available. The driver just chooses one of the methods: 1) scratch relocation; 2) the pointer is in SGPR01; 3) the pointer is at the address pointed to by SGPR01 mareko: I don't understand the comment. Of course it's always available. The driver just chooses one of…

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU add support for spilling to a user sgpr pointed buffers
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 74195

lib/Target/AMDGPU/AMDGPU.td

lib/Target/AMDGPU/AMDGPUSubtarget.h

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

lib/Target/AMDGPU/SIFrameLowering.cpp

lib/Target/AMDGPU/SIMachineFunctionInfo.h

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU add support for spilling to a user sgpr pointed buffersClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 74195

lib/Target/AMDGPU/AMDGPU.td

lib/Target/AMDGPU/AMDGPUSubtarget.h

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

lib/Target/AMDGPU/SIFrameLowering.cpp

lib/Target/AMDGPU/SIMachineFunctionInfo.h

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

AMDGPU add support for spilling to a user sgpr pointed buffers
ClosedPublic