This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Set number vgprs used in PS shaders based on input registers actually used
ClosedPublic

Authored by dstuttard on Apr 30 2021, 7:36 AM.

Details

Summary

For PS shaders we can use the input SPI_PS_INPUT_ENA and SPI_PS_INPUT_ADDR
registers

Calculate the number of VGPR registers used as input VGPRs based on these
registers rather than the arguments passed in (this conservatively always
allocates the maximum).

Diff Detail

Event Timeline

dstuttard created this revision.Apr 30 2021, 7:36 AM
dstuttard requested review of this revision.Apr 30 2021, 7:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2021, 7:36 AM
foad added a subscriber: foad.Apr 30 2021, 8:38 AM
foad added inline comments.May 18 2021, 3:39 AM
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1159

Doesn't LastEna need to be initialized here too? I assume all these initializations are just to avoid "may be used uninitialized" warnings.

1163

Should this be getPSInputAddr, and if so why didn't the tests you added fail?

1165

Can InputEna ever be zero, and if so what should LastEna be?

1192

Comma instead of dash?

1216–1218

What effect does setting NumArchVGPR here have, given that it was not being set before?

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
56

This function (and possibly the ones next to it) could do with a comment. It's not obvious to me what TotalNumVGPRs means, given that one of the inputs is already called NumVGPR.

dstuttard updated this revision to Diff 352381.Jun 16 2021, 2:56 AM

Some modifications based on review feedback

dstuttard marked 6 inline comments as done.Jun 16 2021, 2:56 AM
dstuttard added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1159

Yes, I've added an initialization for consistency.

1163

Yes, it should be getPSInputAddr.

It is actually very hard to generate a case where this makes a difference! It is when no input args are used, there is an PSInputAddr that sets inputs as allocated, plus there are extra args, which are also unused. Test is added.

The reason is makes little difference is that the line setting the NumVGPRs uses a max of the value calculated here and the actual allocation done for inputs - this usually fixes any discrepancies.
The reason we can't just use the allocation is that when there are extra arguments it will get a different result.

1165

InputEna can never be zero - I should probably put in an assert for that.

1216–1218

NumArchVGPR was introduced to hold the original NumVGPRs (it's set to that at the start of this function). This is useful on architectures with AGPR support as the NumVGPRs (total vgprs) is adjusted with any AGPR registers that are used.

Since we're altering the original VGPR use with this change - we have to alter that value and set NumVGPRs to a new total with AGPR use (for architectures that support it).

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
56

It is slightly confusing. The issue is that there's a difference between "actual number of VGPR registers required" and the NumVGPRs which is the number of explicitly tracked VGPR registers. Depending on the architecture, AGPR registers are included in the total VGPRs - and there are some alignment constraints too, which make the total different.

dstuttard marked 5 inline comments as done.Jun 16 2021, 2:57 AM

Ping?

Do you think that I should add any other reviewers?

Do you think that I should add any other reviewers?

Probably a good idea!

To confirm, this reduces the minimum number of VGPRs used by PS?
I am not sure the language in the description makes this clear.

The code makes sense to me.
Since this has been in review for a while now, assuming it passes CTS, we should probably accept it?

sebastian-ne added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1154–1155

Checking for AMDGPU_PS is enough, having this calling convention in AmdHsaOS would throw an error in SITargetLowering::LowerFormalArguments.
Also, maybe the name IsPixelShader is clearer? (It’s called that in AMDGPUAtomicOptimizer.cpp.)

1173

nit: I think InputEna || InputAddr is normally used in conditions.

dstuttard updated this revision to Diff 373937.Sep 21 2021, 8:01 AM

Rebase and address some of the review comments

dstuttard marked 2 inline comments as done.Sep 21 2021, 8:02 AM
dstuttard added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1154–1155

Actually, removing the AmdHsaOS check causes one of the lit tests to fail: no-hsa-graphics-shaders.ll

dstuttard marked an inline comment as done.Sep 22 2021, 1:43 AM

@critson @foad @Flakebi - are you happy with the changes? Can someone approve?

sebastian-ne added inline comments.Oct 7 2021, 3:41 AM
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1177

Missing . between the sentences

1188–1213

Could you add braces for the if as well?

1203

Should this be an increment or a += NumRegs?

sebastian-ne added inline comments.Oct 7 2021, 3:50 AM
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1154–1155

The lit test fails because the assertion assert((InputEna || InputAddr)) fails and the error message that amdgpu_ps is unsupported in HSA is not printed anymore. So amdgpu_ps in HSA is illegal. Not sure if there’s a better way to handle this though.

dstuttard updated this revision to Diff 378120.Oct 8 2021, 1:01 AM

Address review comments

dstuttard marked 4 inline comments as done.Oct 8 2021, 1:14 AM
dstuttard added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1154–1155

Let's leave it as-is for now then.

1203

It should be += NumRegs - I've renamed PSVGPRCount to PSArgCount instead

dstuttard marked 2 inline comments as done.Oct 8 2021, 1:15 AM
dstuttard added inline comments.Oct 8 2021, 2:19 AM
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1203

For clarification - I meant, the += NumRegs in the lines above are correct, the ++ on PSArgCount is also correct.
PSArgCount is effectively counting through the bits in InputAddr - each arg can be more than one VGPR hence calculating and adding NumRegs for the NumVGPR counters, but only incrementing the PSArgCount.

sebastian-ne accepted this revision.Oct 8 2021, 2:23 AM

Thanks, LGTM

This revision is now accepted and ready to land.Oct 8 2021, 2:23 AM