This is an archive of the discontinued LLVM Phabricator instance.

Differential D121914

[AMDGPU] Stop using getMinimalPhysRegClass in LowerFormalArguments
ClosedPublic

Authored by foad on Mar 17 2022, 7:52 AM.

Download Raw Diff

Details

Reviewers

arsenm
vangthao
rampitec
Joe_Nash

Commits

rG313f306b2684: [AMDGPU] Stop using getMinimalPhysRegClass in LowerFormalArguments

Summary

NFCI. The motivation for this is avoid problems in future if we add new
classes containing only a subset of all VGPRs, or a subset of all SGPRs.
getMinimalPhysRegClass would favour these smaller classes, which is not
what we want here.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Mar 17 2022, 7:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 7:52 AM

Herald added subscribers: kerbowa, hiraditya, t-tye and 6 others. · View Herald Transcript

foad requested review of this revision.Mar 17 2022, 7:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 7:52 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B154837: Diff 416178.Mar 17 2022, 7:53 AM

foad added inline comments.Mar 17 2022, 7:54 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
2551	As an alternative, I did try implementing a getMaximalAllocatablePhysRegClass and using it here. That seemed to cause other problems, but I can pursue it further if you think it's the right way to go.

arsenm accepted this revision.Mar 17 2022, 7:55 AM

This revision is now accepted and ready to land.Mar 17 2022, 7:55 AM

I think a switch case over known RegClasses that we want to use is the way to go. The only question is do we need to handle more cases. Is this passing testing? When I put assert(VT.getSizeInBits() == 32) here I get massive test failures. So what happens to wider types, they all become VGPR_32?

In D121914#3389338, @Joe_Nash wrote:

I think a switch case over known RegClasses that we want to use is the way to go. The only question is do we need to handle more cases. Is this passing testing? When I put assert(VT.getSizeInBits() == 32) here I get massive test failures. So what happens to wider types, they all become VGPR_32?

All arguments are supposed to be split into 32 bit pieces before they reach here

In D121914#3389343, @arsenm wrote:

In D121914#3389338, @Joe_Nash wrote:

I think a switch case over known RegClasses that we want to use is the way to go. The only question is do we need to handle more cases. Is this passing testing? When I put assert(VT.getSizeInBits() == 32) here I get massive test failures. So what happens to wider types, they all become VGPR_32?

All arguments are supposed to be split into 32 bit pieces before they reach here

I'm assuming your failures are on 16-bit types which still use a 32-bit register class

In D121914#3389338, @Joe_Nash wrote:

Is this passing testing?

Yes.

When I put assert(VT.getSizeInBits() == 32) here I get massive test failures. So what happens to wider types, they all become VGPR_32?

See the calling conv definitions in AMDGPUCallingConv.td. All supported types (including i16 and f16) are passed in 32-bit GPRs, either VGPRn or SGPRn.

In D121914#3389344, @arsenm wrote:

In D121914#3389343, @arsenm wrote:

In D121914#3389338, @Joe_Nash wrote:

I think a switch case over known RegClasses that we want to use is the way to go. The only question is do we need to handle more cases. Is this passing testing? When I put assert(VT.getSizeInBits() == 32) here I get massive test failures. So what happens to wider types, they all become VGPR_32?

All arguments are supposed to be split into 32 bit pieces before they reach here

I'm assuming your failures are on 16-bit types which still use a 32-bit register class

Yep. When I put assert(VT.getSizeInBits() <= 32) they go away.

Ok. LGTM

In D121914#3389338, @Joe_Nash wrote:

I think a switch case over known RegClasses that we want to use is the way to go.

Do you prefer this (cribbed from SITargetLowering::insertCopiesSplitCSR)?

const TargetRegisterClass *RC = nullptr;
if (AMDGPU::SGPR_32RegClass.contains(Reg))
  RC = &AMDGPU::SGPR_32RegClass;
else if (AMDGPU::VGPR_32RegClass.contains(Reg))
  RC = &AMDGPU::VGPR_32RegClass;
else
  llvm_unreachable("Unexpected register class in LowerFormalArguments!");

As you say, it seems guaranteed to be SGPR or VGPR by the calling convention. They compile to the same thing under optimization right? If so I slightly prefer the llvm_unreachable version because the intent is locally clear.

This revision was landed with ongoing or failed builds.Mar 17 2022, 8:24 AM

Closed by commit rG313f306b2684: [AMDGPU] Stop using getMinimalPhysRegClass in LowerFormalArguments (authored by foad). · Explain Why

This revision was automatically updated to reflect the committed changes.

foad added a commit: rG313f306b2684: [AMDGPU] Stop using getMinimalPhysRegClass in LowerFormalArguments.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIISelLowering.cpp

8 lines

Diff 416187

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,542 Lines • ▼ Show 20 Lines	if (IsEntryFunc && VA.isMemLoc()) {
if (!Arg.Flags.isByVal())		if (!Arg.Flags.isByVal())
Chains.push_back(Val.getValue(1));		Chains.push_back(Val.getValue(1));
continue;		continue;
}		}

assert(VA.isRegLoc() && "Parameter must be in a register!");		assert(VA.isRegLoc() && "Parameter must be in a register!");

Register Reg = VA.getLocReg();		Register Reg = VA.getLocReg();
const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg, VT);		const TargetRegisterClass *RC = nullptr;
foadAuthorUnsubmitted Done Reply Inline Actions As an alternative, I did try implementing a getMaximalAllocatablePhysRegClass and using it here. That seemed to cause other problems, but I can pursue it further if you think it's the right way to go. foad: As an alternative, I did try implementing a getMaximalAllocatablePhysRegClass and using it here.
		if (AMDGPU::VGPR_32RegClass.contains(Reg))
		RC = &AMDGPU::VGPR_32RegClass;
		else if (AMDGPU::SGPR_32RegClass.contains(Reg))
		RC = &AMDGPU::SGPR_32RegClass;
		else
		llvm_unreachable("Unexpected register class in LowerFormalArguments!");
EVT ValVT = VA.getValVT();		EVT ValVT = VA.getValVT();

Reg = MF.addLiveIn(Reg, RC);		Reg = MF.addLiveIn(Reg, RC);
SDValue Val = DAG.getCopyFromReg(Chain, DL, Reg, VT);		SDValue Val = DAG.getCopyFromReg(Chain, DL, Reg, VT);

if (Arg.Flags.isSRet()) {		if (Arg.Flags.isSRet()) {
// The return object should be reasonably addressable.		// The return object should be reasonably addressable.

▲ Show 20 Lines • Show All 10,063 Lines • Show Last 20 Lines