Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Needs a test case.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
551–553 | The comment could be a bit more helpful. As I understand it: index_stride is a 2-bit field in bits 118:117 of the descriptor, so bits 22:21 of this word. (Is this the same for all architectures?) The value coming in might be either 0b10 (stride=32) or 0b11 (stride=64) so we force it to 0b10 by clearing bit 21. |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
551–553 | Yes - the driver will always set this up assuming wave64. Just setting the bits to the wave32 option covers that case (as well as the supposedly non-occurring case where the value is already correct for wave32). How about: |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
557 | Can you use S_BITSET0_B32 instead? |
Yeah, arc never seems to update that field once the review has been created. You can do it manually from the web interface.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
551–553 | Sounds OK, though I think it would be better to mention the value 0b11 too, otherwise it's not obvious why just setting a single bit to 0 has the desired effect. Also I've never seen binary literals with just a "b" prefix which is why I used "0b". |
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
557 | Could do - the s_and gets transformed somewhere to s_bitset anyway. |
@dstuttard I think this is causing errors on EXPENSIVE_CHECKS builds - please can you take a look?
$ ":" "RUN: at line 3" $ "e:\llvm\ninja\bin\opt.exe" "-S" "-mtriple=amdgcn-amd-amdpal" "-amdgpu-annotate-kernel-features" "E:\llvm\llvm-project\llvm\test\CodeGen\AMDGPU\pal-simple-indirect-call.ll" $ "e:\llvm\ninja\bin\filecheck.exe" "-check-prefix=GCN" "E:\llvm\llvm-project\llvm\test\CodeGen\AMDGPU\pal-simple-indirect-call.ll" $ ":" "RUN: at line 6" $ "e:\llvm\ninja\bin\llc.exe" "-mtriple=amdgcn-amd-amdpal" "-mcpu=gfx900" $ "e:\llvm\ninja\bin\filecheck.exe" "-check-prefixes=GFX9" "E:\llvm\llvm-project\llvm\test\CodeGen\AMDGPU\pal-simple-indirect-call.ll" $ ":" "RUN: at line 7" $ "e:\llvm\ninja\bin\llc.exe" "-mtriple=amdgcn-amd-amdpal" "-mcpu=gfx1010" # command stderr: # After Prologue/Epilogue Insertion & Frame Finalization # Machine code for function test_simple_indirect_call: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten Function Live Ins: $sgpr0, $sgpr0 bb.0 (%ir-block.0): liveins: $sgpr0, $sgpr0 $sgpr32 = S_MOV_B32 0 $sgpr36_sgpr37 = S_GETPC_B64 $sgpr36 = S_MOV_B32 $sgpr0 $sgpr36_sgpr37_sgpr38_sgpr39 = S_LOAD_DWORDX4_IMM $sgpr36_sgpr37, 16, 0, implicit-def $sgpr36_sgpr37_sgpr38_sgpr39 :: (dereferenceable invariant load 16, align 4, addrspace 4) $sgpr39 = S_BITSET0_B32 21, implicit-def $sgpr39 $sgpr36 = S_ADD_U32 $sgpr36, $sgpr0, implicit-def $scc, implicit-def $sgpr36_sgpr37_sgpr38_sgpr39 $sgpr37 = S_ADDC_U32 $sgpr37, 0, implicit-def $scc, implicit $scc, implicit-def $sgpr36_sgpr37_sgpr38_sgpr39 renamable $sgpr4_sgpr5 = S_GETPC_B64 $sgpr0_sgpr1_sgpr2_sgpr3 = COPY $sgpr36_sgpr37_sgpr38_sgpr39 dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, 0, <regmask $sgpr32 $sgpr33 $sgpr34 $sgpr35 $sgpr36 $sgpr37 $sgpr38 $sgpr39 $sgpr40 $sgpr41 $sgpr42 $sgpr43 $sgpr44 $sgpr45 $sgpr46 $sgpr47 $sgpr48 $sgpr49 $sgpr50 $sgpr51 $sgpr52 $sgpr53 $sgpr54 $sgpr55 $sgpr56 $sgpr57 $sgpr58 $sgpr59 $sgpr60 $sgpr61 $sgpr62 $sgpr63 $sgpr64 and 1047 more...>, implicit $sgpr0_sgpr1_sgpr2_sgpr3 S_ENDPGM 0 # End machine code for function test_simple_indirect_call. *** Bad machine code: Explicit operand marked as def *** - function: test_simple_indirect_call - basic block: %bb.0 (0x195b4d11340) - instruction: $sgpr39 = S_BITSET0_B32 21, implicit-def $sgpr39 - operand 2: implicit-def $sgpr39 *** Bad machine code: Explicit operand marked as implicit *** - function: test_simple_indirect_call - basic block: %bb.0 (0x195b4d11340) - instruction: $sgpr39 = S_BITSET0_B32 21, implicit-def $sgpr39 - operand 2: implicit-def $sgpr39 *** Bad machine code: Operand should be tied *** - function: test_simple_indirect_call - basic block: %bb.0 (0x195b4d11340) - instruction: $sgpr39 = S_BITSET0_B32 21, implicit-def $sgpr39 - operand 2: implicit-def $sgpr39 LLVM ERROR: Found 3 machine code errors.
The comment could be a bit more helpful. As I understand it: index_stride is a 2-bit field in bits 118:117 of the descriptor, so bits 22:21 of this word. (Is this the same for all architectures?) The value coming in might be either 0b10 (stride=32) or 0b11 (stride=64) so we force it to 0b10 by clearing bit 21.