This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/test/CodeGen/AMDGPU/
-
test/
-
CodeGen/
-
AMDGPU/
-
attr-amdgpu-flat-work-group-size-vgpr-limit.ll

Differential D131931

[amdgpu] Change the RA to basic
ClosedPublic

Authored by LuoYuanke on Aug 15 2022, 5:25 PM.

Download Raw Diff

Details

Reviewers

foad
arsenm
rampitec

Commits

rG28733d86cf7b: [amdgpu] Change the RA to basic

Summary

Specifying -regalloc=fast is not reliable. With fast register allocation,
LIS = getAnalysisIfAvailable<LiveIntervals>(); get nullptr
in "si-lower-sgpr-spills" pass, so the slot index is not created in the
pass for new inserted instructions. When verifying the machine
instruction, it fails on checking slot index. While greedy-ra is time
consuming basic-ra can be used to reduce compiling time for this test case.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

LuoYuanke created this revision.Aug 15 2022, 5:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 5:25 PM

Herald added subscribers: kosarev, kerbowa, t-tye and 5 others. · View Herald Transcript

LuoYuanke requested review of this revision.Aug 15 2022, 5:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 5:25 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

LuoYuanke added reviewers: foad, arsenm, rampitec.Aug 15 2022, 5:27 PM

Harbormaster completed remote builds in B181414: Diff 452854.Aug 15 2022, 6:16 PM

Typo in description: "intruction".

I also think spilling should be fixed to work with fast-ra instead.

LuoYuanke edited the summary of this revision. (Show Details)Aug 16 2022, 5:04 PM

In D131931#3726692, @rampitec wrote:

Typo in description: "intruction".

Fixed, thanks.

I also think spilling should be fixed to work with fast-ra instead.

Fast-ra doesn't depened on SlotIndexes and LiveIntervals, but "si-lower-sgpr-spills" depend on LiveIntervals, so I am not sure we can fix. On the other hand basic-ra fit this case very well, it reduce the compiling time and depend on SlotIndexes and LiveIntervals. Even we use fast-ra the pass pipeline doesn't change much which means it fast-ra won't reduce the compiling time for other passes, it only reduce the compiling time of ra itself.

In D131931#3727692, @LuoYuanke wrote:

In D131931#3726692, @rampitec wrote:

Typo in description: "intruction".

Fixed, thanks.

I also think spilling should be fixed to work with fast-ra instead.

Fast-ra doesn't depened on SlotIndexes and LiveIntervals, but "si-lower-sgpr-spills" depend on LiveIntervals, so I am not sure we can fix. On the other hand basic-ra fit this case very well, it reduce the compiling time and depend on SlotIndexes and LiveIntervals. Even we use fast-ra the pass pipeline doesn't change much which means it fast-ra won't reduce the compiling time for other passes, it only reduce the compiling time of ra itself.

OK, using fast-ra is not a point of this test anyway, and that shall not be your problem to fix sgpr spilling even if we need it.

This revision is now accepted and ready to land.Aug 17 2022, 11:38 AM

Closed by commit rG28733d86cf7b: [amdgpu] Change the RA to basic (authored by LuoYuanke). · Explain WhyAug 17 2022, 5:18 PM

This revision was automatically updated to reflect the committed changes.

LuoYuanke added a commit: rG28733d86cf7b: [amdgpu] Change the RA to basic.

Specifying -regalloc=fast is not reliable. With fast register allocation,
LIS = getAnalysisIfAvailable<LiveIntervals>(); get nullptr
in "si-lower-sgpr-spills" pass, so the slot index is not created in the
pass for new inserted instructions.

That sounds like a bug in SILowerSGPRSpills. It claims to preserve all analyses, so it should be able to preserve SlotIndexex even if LiveIntervals is not available.

When verifying the machine
instruction, it fails on checking slot index.

No, this test does not fail - it has been passing for over two years. So what problem are you trying to fix? Are you testing some local changes that provoked a failure?

No, this test does not fail - it has been passing for over two years. So what problem are you trying to fix? Are you testing some local changes that provoked a failure?

This case doesn't fail, but the similar case (as below) would fail in SILowerSGPRSpills pass. Replacing it with basic-ra has no side effect.

define internal void @use256vgprs() {
  %v0 = call i32 asm sideeffect "; def $0", "=v"()
  %v1 = call i32 asm sideeffect "; def $0", "=v"()
  call void asm sideeffect "; use $0", "v"(i32 %v0)
  call void asm sideeffect "; use $0", "v"(i32 %v1)
  ret void
}

define amdgpu_kernel void @f256() #256 {
  call void @use256vgprs()
  ret void
}
attributes #256 = { nounwind "amdgpu-flat-work-group-size"="256,256" }

define amdgpu_kernel void @f512() #512 {
  call void @foo()
  call void @use256vgprs()
  ret void
}
attributes #512 = { nounwind "amdgpu-flat-work-group-size"="512,512" }

define amdgpu_kernel void @f1024() #1024 {
  call void @foo()
  call void @use256vgprs()
  ret void
}

attributes #1024 = { nounwind "amdgpu-flat-work-group-size"="1024,1024" }

declare void @foo()

arsenm added a reverting change: rGe7cf44e72329: Revert "[amdgpu] Change the RA to basic".Jan 12 2023, 5:32 AM

Revision Contents

Path

Size

llvm/

test/

CodeGen/

AMDGPU/

attr-amdgpu-flat-work-group-size-vgpr-limit.ll

22 lines

Diff 453480

llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll

	; -enable-misched=false makes the register usage more predictable			; -enable-misched=false makes the register usage more predictable
	; -regalloc=fast just makes the test run faster			; -regalloc=basic just makes the test run faster
	; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX9			; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=basic -vgpr-regalloc=basic < %s \| FileCheck %s --check-prefixes=GCN,GFX9
	; RUN: llc -march=amdgcn -mcpu=gfx90a -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX90A			; RUN: llc -march=amdgcn -mcpu=gfx90a -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=basic -vgpr-regalloc=basic < %s \| FileCheck %s --check-prefixes=GCN,GFX90A
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10WGP-WAVE32			; RUN: llc -march=amdgcn -mcpu=gfx1010 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=basic -vgpr-regalloc=basic < %s \| FileCheck %s --check-prefixes=GCN,GFX10WGP-WAVE32
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10WGP-WAVE64			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=basic -vgpr-regalloc=basic < %s \| FileCheck %s --check-prefixes=GCN,GFX10WGP-WAVE64
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+cumode -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10CU-WAVE32			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+cumode -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=basic -vgpr-regalloc=basic < %s \| FileCheck %s --check-prefixes=GCN,GFX10CU-WAVE32
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+cumode,+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10CU-WAVE64			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+cumode,+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=basic -vgpr-regalloc=basic < %s \| FileCheck %s --check-prefixes=GCN,GFX10CU-WAVE64
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX11WGP-WAVE32			; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=basic -vgpr-regalloc=basic < %s \| FileCheck %s --check-prefixes=GCN,GFX11WGP-WAVE32
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX11WGP-WAVE64			; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=basic -vgpr-regalloc=basic < %s \| FileCheck %s --check-prefixes=GCN,GFX11WGP-WAVE64
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+cumode -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX11CU-WAVE32			; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+cumode -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=basic -vgpr-regalloc=basic < %s \| FileCheck %s --check-prefixes=GCN,GFX11CU-WAVE32
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+cumode,+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX11CU-WAVE64			; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+cumode,+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=basic -vgpr-regalloc=basic < %s \| FileCheck %s --check-prefixes=GCN,GFX11CU-WAVE64

	define internal void @use256vgprs() {			define internal void @use256vgprs() {
	%v0 = call i32 asm sideeffect "; def $0", "=v"()			%v0 = call i32 asm sideeffect "; def $0", "=v"()
	%v1 = call i32 asm sideeffect "; def $0", "=v"()			%v1 = call i32 asm sideeffect "; def $0", "=v"()
	%v2 = call i32 asm sideeffect "; def $0", "=v"()			%v2 = call i32 asm sideeffect "; def $0", "=v"()
	%v3 = call i32 asm sideeffect "; def $0", "=v"()			%v3 = call i32 asm sideeffect "; def $0", "=v"()
	%v4 = call i32 asm sideeffect "; def $0", "=v"()			%v4 = call i32 asm sideeffect "; def $0", "=v"()
	%v5 = call i32 asm sideeffect "; def $0", "=v"()			%v5 = call i32 asm sideeffect "; def $0", "=v"()
	▲ Show 20 Lines • Show All 569 Lines • Show Last 20 Lines