Pal uses a different mechanism for user sgprs.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
I don't see why PAL would get special treatment here. I think the problem is we don't really try to account for the formal arguments in the user SGPR count. A better fix would be to track that accurately
Pal uses a completely different approach where the front end does all the setup for the user sgprs and passes the info in via pal metadata. The backend can override/modify this, but for user data sgprs is generally doesn't. Moving the pal shaders to use the same model would be a fairly big piece of work, and would mean adding quite a lot of extra code to the backend that's specific to pal.
I'm not suggesting moving it into the same model, but the backend should figure out the user SGPR count given the formal argument list in the IR
I'm not suggesting moving it into the same model, but the backend should figure out the user SGPR count given the formal argument list in the IR
David is right -- we use a completely different model for PAL. The middle-end compiler LGC (part of LLPC) determines what user data sgprs there are, and what hardware-set sgprs (e.g. threadgroup_id0/1/2 in a CS) it wants to enable. It sets all the relevant bits and fields in the rsrc2 passed in IR metadata, and it adds the right number and type of inreg args to the function in the IR. For PAL, the backend would not modify those fields and bits in rsrc2 at all.
In the PAL model, the backend only sets fields that only it can know about, such as vgpr usage.
So this fix looks correct to me for PAL.
Additionally, not everything marked as inreg actually contributes to the NumUserSGPRs - so the knowledge about which arguments are and aren't would need to be added to the backend.
To be honest, it looks like the entire function should only be called for ROCm-style compute. It shouldn't be used for either PAL or Mesa. Except maybe the Info.hasPrivateSegmentWaveByteOffset part? I'm not sure about that last one. But none of the hasWorkGroupFoo stuff should apply as far as I can tell.
And by the way, why is that a function of the OS? Shouldn't it be a function of the calling convention?
So to sum up: I agree with Tim and David that this code shouldn't be applied in the PAL/Mesa case (they both need their own fix for this) -- though keeping the assertion alive somewhere would be good. But it's not clear to me that this patch is the right way to do it.
It feels somehwat dubious that this entire method runs for an amdgpu_cs calling convention, but perhaps we don't need to change that, given the method already has an IsShader check. How about: apply the init16 bug workaround only if !IsShader, but keep the assertion unmodified?
I wasn't sure whether Mesa would also have issues - I was going to follow that up, so thanks for the info.
I'll update as you suggest to !IsShader
As for the assert - pal shaders won't return anything useful for Info.getNumPreloadedSGPRs() so the assert would always fail. I assume (perhaps wrongly) that Mesa would be the same?
Thanks.
Okay, no big deal. I thought it would return the number of SGPR arguments (since those are preloaded, it would kind of make sense).
Added a test
I thought the best bet for this was something that showed the original patch
didn't alter user sgpr registers (and left RSRC2 as-is).