This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix getEUsPerCU for gfx10 in CU mode
ClosedPublic

Authored by foad on Mar 26 2020, 9:13 AM.

Details

Summary

"Per CU" is a bit simplistic for gfx10, but I couldn't think of a better
name.

Diff Detail

Event Timeline

foad created this revision.Mar 26 2020, 9:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2020, 9:13 AM
foad added a comment.Mar 26 2020, 9:16 AM

This affects the calculation of how many VGPRs we can use for a gfx10 cu-mode compute shader that specifies amdgpu-flat-work-group-size. How should I test it?

Seems reasonable, but it needs a test case.

foad updated this revision to Diff 253103.Mar 27 2020, 6:52 AM

Add test.

This revision is now accepted and ready to land.Mar 27 2020, 11:43 AM
This revision was automatically updated to reflect the committed changes.
LuoYuanke added inline comments.
llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll
3

Specifying -regalloc=fast is not reliable. With fast register allocation, LIS = getAnalysisIfAvailable<LiveIntervals>(); get nullptr in "si-lower-sgpr-spills" pass, so the slot index is not created in the pass for new inserted instructions. When verifying the machine intruction, it fails on checking slot index. It can be reproduced with below test case. Is it possible to use greedy-ra and reduce the compiling time for this test case?

define internal void @use256vgprs() {
  %v0 = call i32 asm sideeffect "; def $0", "=v"()
  %v1 = call i32 asm sideeffect "; def $0", "=v"()
  call void asm sideeffect "; use $0", "v"(i32 %v0)
  call void asm sideeffect "; use $0", "v"(i32 %v1)
  ret void
}

define amdgpu_kernel void @f256() #256 {
  call void @use256vgprs()
  ret void
}
attributes #256 = { nounwind "amdgpu-flat-work-group-size"="256,256" }

define amdgpu_kernel void @f512() #512 {
  call void @foo()
  call void @use256vgprs()
  ret void
}
attributes #512 = { nounwind "amdgpu-flat-work-group-size"="512,512" }

define amdgpu_kernel void @f1024() #1024 {
  call void @foo()
  call void @use256vgprs()
  ret void
}

attributes #1024 = { nounwind "amdgpu-flat-work-group-size"="1024,1024" }

declare void @foo()
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 6:16 AM
Herald added a subscriber: kosarev. · View Herald Transcript
foad added inline comments.Aug 23 2022, 6:00 AM
llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll
3

That sounds like a bug in SILowerSGPRSpills. It should not claim to preserve SlotIndexes if it does not preserve them.

arsenm added inline comments.Sep 15 2022, 12:08 PM
llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll
3

The pass is only updating SlotIndexes through LiveIntervals, assuming they come as a pair. With -verify-machineinstrs, somehow we end up in a situation with SlotIndexes and without LiveIntervals

arsenm added inline comments.Sep 15 2022, 12:10 PM
llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll
3

Apparently StackColoring runs at -O0 and only uses SlotIndexes

arsenm added inline comments.Sep 15 2022, 1:11 PM
llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll
3