This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
ClosedPublic

Authored by dstuttard on May 4 2021, 5:45 AM.

Details

Summary

Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.

Diff Detail

Event Timeline

dstuttard created this revision.May 4 2021, 5:45 AM
dstuttard requested review of this revision.May 4 2021, 5:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2021, 5:45 AM
foad added a comment.May 4 2021, 5:56 AM

Needs a test case.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
551–553

The comment could be a bit more helpful. As I understand it: index_stride is a 2-bit field in bits 118:117 of the descriptor, so bits 22:21 of this word. (Is this the same for all architectures?) The value coming in might be either 0b10 (stride=32) or 0b11 (stride=64) so we force it to 0b10 by clearing bit 21.

foad added a comment.May 4 2021, 5:57 AM

"API" should be "ABI" in the title.

dstuttard added inline comments.May 4 2021, 6:06 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
551–553

Yes - the driver will always set this up assuming wave64. Just setting the bits to the wave32 option covers that case (as well as the supposedly non-occurring case where the value is already correct for wave32).

How about:
The driver will always set the SRD for wave 64. If the shader is actually wave32 we have to modify the const_index_stride field of the descriptor (bits 22:21) to b10 (stride=32). The reason the driver does this is that there can be cases where it presents 2 shaders with different wave size (e.g. VsFs).

dstuttard updated this revision to Diff 342710.May 4 2021, 6:06 AM

API -> ABI

"API" should be "ABI" in the title.

Hmm, changed the commit, but the review title remains unchanged.

arsenm added inline comments.May 4 2021, 6:16 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
557

Can you use S_BITSET0_B32 instead?

foad added a comment.May 4 2021, 6:18 AM

"API" should be "ABI" in the title.

Hmm, changed the commit, but the review title remains unchanged.

Yeah, arc never seems to update that field once the review has been created. You can do it manually from the web interface.

foad added inline comments.May 4 2021, 6:21 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
551–553

Sounds OK, though I think it would be better to mention the value 0b11 too, otherwise it's not obvious why just setting a single bit to 0 has the desired effect.

Also I've never seen binary literals with just a "b" prefix which is why I used "0b".

dstuttard retitled this revision from AMDGPU: Correct const_index_stride for wave 32 for PAL API to AMDGPU: Correct const_index_stride for wave 32 for PAL ABI.May 4 2021, 6:28 AM
dstuttard added inline comments.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
557

Could do - the s_and gets transformed somewhere to s_bitset anyway.

dstuttard marked an inline comment as done.May 4 2021, 8:10 AM
dstuttard updated this revision to Diff 342746.May 4 2021, 8:12 AM
dstuttard marked an inline comment as done.

Change to bitset0, extend comment and update a test

foad accepted this revision.May 4 2021, 8:30 AM
This revision is now accepted and ready to land.May 4 2021, 8:30 AM
This revision was landed with ongoing or failed builds.May 7 2021, 4:28 AM
This revision was automatically updated to reflect the committed changes.
RKSimon added a subscriber: RKSimon.May 7 2021, 5:10 AM

@dstuttard I think this is causing errors on EXPENSIVE_CHECKS builds - please can you take a look?

$ ":" "RUN: at line 3"
$ "e:\llvm\ninja\bin\opt.exe" "-S" "-mtriple=amdgcn-amd-amdpal" "-amdgpu-annotate-kernel-features" "E:\llvm\llvm-project\llvm\test\CodeGen\AMDGPU\pal-simple-indirect-call.ll"
$ "e:\llvm\ninja\bin\filecheck.exe" "-check-prefix=GCN" "E:\llvm\llvm-project\llvm\test\CodeGen\AMDGPU\pal-simple-indirect-call.ll"
$ ":" "RUN: at line 6"
$ "e:\llvm\ninja\bin\llc.exe" "-mtriple=amdgcn-amd-amdpal" "-mcpu=gfx900"
$ "e:\llvm\ninja\bin\filecheck.exe" "-check-prefixes=GFX9" "E:\llvm\llvm-project\llvm\test\CodeGen\AMDGPU\pal-simple-indirect-call.ll"
$ ":" "RUN: at line 7"
$ "e:\llvm\ninja\bin\llc.exe" "-mtriple=amdgcn-amd-amdpal" "-mcpu=gfx1010"
# command stderr:

# After Prologue/Epilogue Insertion & Frame Finalization
# Machine code for function test_simple_indirect_call: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten
Function Live Ins: $sgpr0, $sgpr0

bb.0 (%ir-block.0):
  liveins: $sgpr0, $sgpr0
  $sgpr32 = S_MOV_B32 0
  $sgpr36_sgpr37 = S_GETPC_B64
  $sgpr36 = S_MOV_B32 $sgpr0
  $sgpr36_sgpr37_sgpr38_sgpr39 = S_LOAD_DWORDX4_IMM $sgpr36_sgpr37, 16, 0, implicit-def $sgpr36_sgpr37_sgpr38_sgpr39 :: (dereferenceable invariant load 16, align 4, addrspace 4)
  $sgpr39 = S_BITSET0_B32 21, implicit-def $sgpr39
  $sgpr36 = S_ADD_U32 $sgpr36, $sgpr0, implicit-def $scc, implicit-def $sgpr36_sgpr37_sgpr38_sgpr39
  $sgpr37 = S_ADDC_U32 $sgpr37, 0, implicit-def $scc, implicit $scc, implicit-def $sgpr36_sgpr37_sgpr38_sgpr39
  renamable $sgpr4_sgpr5 = S_GETPC_B64
  $sgpr0_sgpr1_sgpr2_sgpr3 = COPY $sgpr36_sgpr37_sgpr38_sgpr39
  dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, 0, <regmask $sgpr32 $sgpr33 $sgpr34 $sgpr35 $sgpr36 $sgpr37 $sgpr38 $sgpr39 $sgpr40 $sgpr41 $sgpr42 $sgpr43 $sgpr44 $sgpr45 $sgpr46 $sgpr47 $sgpr48 $sgpr49 $sgpr50 $sgpr51 $sgpr52 $sgpr53 $sgpr54 $sgpr55 $sgpr56 $sgpr57 $sgpr58 $sgpr59 $sgpr60 $sgpr61 $sgpr62 $sgpr63 $sgpr64 and 1047 more...>, implicit $sgpr0_sgpr1_sgpr2_sgpr3
  S_ENDPGM 0

# End machine code for function test_simple_indirect_call.

*** Bad machine code: Explicit operand marked as def ***
- function:    test_simple_indirect_call
- basic block: %bb.0  (0x195b4d11340)
- instruction: $sgpr39 = S_BITSET0_B32 21, implicit-def $sgpr39
- operand 2:   implicit-def $sgpr39

*** Bad machine code: Explicit operand marked as implicit ***
- function:    test_simple_indirect_call
- basic block: %bb.0  (0x195b4d11340)
- instruction: $sgpr39 = S_BITSET0_B32 21, implicit-def $sgpr39
- operand 2:   implicit-def $sgpr39

*** Bad machine code: Operand should be tied ***
- function:    test_simple_indirect_call
- basic block: %bb.0  (0x195b4d11340)
- instruction: $sgpr39 = S_BITSET0_B32 21, implicit-def $sgpr39
- operand 2:   implicit-def $sgpr39
LLVM ERROR: Found 3 machine code errors.

Yes, thanks. Already reverted.