This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Reserve extra SGPR blocks wth XNACK "any" TID Setting
ClosedPublic

Authored by kerbowa on Mar 6 2023, 10:28 AM.

Details

Summary

ASMPrinter was relying on feature bits to setup extra SGRPs in the knerel
descriptor for the xnack_mask. This was broken for the dynamic XNACK "any" TID
setting which could cause user SGPRs to be clobbered if the number of SGPRs
reserved was near a granulated block boundary.

When XNACK was enabled this worked correctly in the ASMParser which meant some
kernels were only failing without "-save-temps".

Fixes: SWDEV-382764

Diff Detail

Event Timeline

kerbowa created this revision.Mar 6 2023, 10:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2023, 10:28 AM
kerbowa requested review of this revision.Mar 6 2023, 10:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2023, 10:28 AM
foad added inline comments.Mar 7 2023, 1:33 AM
llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll
4

Could you pipe the binary into llvm-readelf --notes - instead of llvm-objdump, and then match text instead of hex dumps?

kerbowa added inline comments.Mar 7 2023, 9:40 AM
llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll
4

The notes section contains metadata that may not match the kernel descriptor (KD) in .rodata. In a failed test, the metadata was the same in both passing and failing cases, but the KD was different. I can add an extra runline though.

kerbowa updated this revision to Diff 503077.Mar 7 2023, 9:48 AM

Add readelf run-line to test.

kzhuravl added inline comments.Mar 7 2023, 3:45 PM
llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll
4

What was the differfence in the kernel descriptor between passing and failing case?

kerbowa added inline comments.Mar 7 2023, 5:11 PM
llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll
4

The difference was the granulated SGPR field. It was 1 in the failing case, 2 in the passing version. I can't really replicate it here since the only reason it was passing with save-temps, is that the ASM parser was correctly checking the dynamic target ID setting and encoding the correct kernel descriptor. We already have MC tests for that. The metadata and what the ASM printer is doing has always been incorrect.

kzhuravl accepted this revision.EditedMar 8 2023, 2:19 AM

LGTM, but please add a todo or a fixme (see comment below), thanks

llvm/test/CodeGen/AMDGPU/tid-kd-xnack-any.ll
4

There are several internal discussions related to asm directives and I think we are going to end up adding the assembler directive for granulated sgpr count. Once that directive is added we can change this test from checking hex to checking that directory. Would you mind adding a TODO or a FIXME saying this test needs to be updated to use the directive instead of hex once the directive is available?

This revision is now accepted and ready to land.Mar 8 2023, 2:19 AM

Actually, this is breaking tests with non-HSA. Is TargetID relevant for pal/graphics/ect @foad, or should the default there be XNACK- in the absence of any explicit subtarget features being added?

kerbowa updated this revision to Diff 503461.Mar 8 2023, 11:38 AM

Update tests.

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 11:38 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
kerbowa added a reviewer: Restricted Project.Mar 13 2023, 8:24 AM

Added AMDGPU group to reviewers.

Is there any objection to changing the defaults for subtargets that support XNACK to always reserve extra SGPRs unless -xnack is explicitly requested? This would impact graphics as well. The old defaults were doing the opposite and only reserving the extra SGPRs with +xnack, meaning the default in the absence of either +/-xnack will be changing.