This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu] Change the RA to basic
ClosedPublic

Authored by LuoYuanke on Aug 15 2022, 5:25 PM.

Details

Summary

Specifying -regalloc=fast is not reliable. With fast register allocation,
LIS = getAnalysisIfAvailable<LiveIntervals>(); get nullptr
in "si-lower-sgpr-spills" pass, so the slot index is not created in the
pass for new inserted instructions. When verifying the machine
instruction, it fails on checking slot index. While greedy-ra is time
consuming basic-ra can be used to reduce compiling time for this test case.

Diff Detail

Event Timeline

LuoYuanke created this revision.Aug 15 2022, 5:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 5:25 PM
LuoYuanke requested review of this revision.Aug 15 2022, 5:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 5:25 PM

Typo in description: "intruction".

I also think spilling should be fixed to work with fast-ra instead.

LuoYuanke edited the summary of this revision. (Show Details)Aug 16 2022, 5:04 PM

Typo in description: "intruction".

Fixed, thanks.

I also think spilling should be fixed to work with fast-ra instead.

Fast-ra doesn't depened on SlotIndexes and LiveIntervals, but "si-lower-sgpr-spills" depend on LiveIntervals, so I am not sure we can fix. On the other hand basic-ra fit this case very well, it reduce the compiling time and depend on SlotIndexes and LiveIntervals. Even we use fast-ra the pass pipeline doesn't change much which means it fast-ra won't reduce the compiling time for other passes, it only reduce the compiling time of ra itself.

rampitec accepted this revision.Aug 17 2022, 11:38 AM

Typo in description: "intruction".

Fixed, thanks.

I also think spilling should be fixed to work with fast-ra instead.

Fast-ra doesn't depened on SlotIndexes and LiveIntervals, but "si-lower-sgpr-spills" depend on LiveIntervals, so I am not sure we can fix. On the other hand basic-ra fit this case very well, it reduce the compiling time and depend on SlotIndexes and LiveIntervals. Even we use fast-ra the pass pipeline doesn't change much which means it fast-ra won't reduce the compiling time for other passes, it only reduce the compiling time of ra itself.

OK, using fast-ra is not a point of this test anyway, and that shall not be your problem to fix sgpr spilling even if we need it.

This revision is now accepted and ready to land.Aug 17 2022, 11:38 AM
This revision was automatically updated to reflect the committed changes.
foad added a comment.Aug 23 2022, 6:17 AM

Specifying -regalloc=fast is not reliable. With fast register allocation,
LIS = getAnalysisIfAvailable<LiveIntervals>(); get nullptr
in "si-lower-sgpr-spills" pass, so the slot index is not created in the
pass for new inserted instructions.

That sounds like a bug in SILowerSGPRSpills. It claims to preserve all analyses, so it should be able to preserve SlotIndexex even if LiveIntervals is not available.

When verifying the machine
instruction, it fails on checking slot index.

No, this test does not fail - it has been passing for over two years. So what problem are you trying to fix? Are you testing some local changes that provoked a failure?

No, this test does not fail - it has been passing for over two years. So what problem are you trying to fix? Are you testing some local changes that provoked a failure?

This case doesn't fail, but the similar case (as below) would fail in SILowerSGPRSpills pass. Replacing it with basic-ra has no side effect.

define internal void @use256vgprs() {
  %v0 = call i32 asm sideeffect "; def $0", "=v"()
  %v1 = call i32 asm sideeffect "; def $0", "=v"()
  call void asm sideeffect "; use $0", "v"(i32 %v0)
  call void asm sideeffect "; use $0", "v"(i32 %v1)
  ret void
}

define amdgpu_kernel void @f256() #256 {
  call void @use256vgprs()
  ret void
}
attributes #256 = { nounwind "amdgpu-flat-work-group-size"="256,256" }

define amdgpu_kernel void @f512() #512 {
  call void @foo()
  call void @use256vgprs()
  ret void
}
attributes #512 = { nounwind "amdgpu-flat-work-group-size"="512,512" }

define amdgpu_kernel void @f1024() #1024 {
  call void @foo()
  call void @use256vgprs()
  ret void
}

attributes #1024 = { nounwind "amdgpu-flat-work-group-size"="1024,1024" }

declare void @foo()