This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/Utils/
-
Target/
-
AMDGPU/
-
Utils/
5/10
AMDGPUBaseInfo.cpp
-
test/MC/AMDGPU/
-
MC/
-
AMDGPU/
-
hsa-v3.s

Differential D84194

[AMDGPU] Correct the number of SGPR blocks used for GFX9
AbandonedPublic

Authored by rochauha on Jul 20 2020, 11:58 AM.

Download Raw Diff

Details

Reviewers

scott.linder
t-tye
arsenm

Summary

Edit : Updating the summary based on comments

Even though granularity is 8, the roundup must be an even number of 8-granules for GFX9.
Probably this also needs to be mentioned in https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-compute-pgm-rsrc1-gfx6-gfx10-table for GRANULATED_WAVEFRONT_SGPR_COUNT.

The difference is seen when a the rounded value aligns to 8 but not to 16. (for example 40, 56).
This patch corrects the roundup for GFX9, hence correcting the number of SGPRBlocks.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	880 ms	linux > LLVM.CodeGen/AMDGPU/GlobalISel::Unknown Unit Message ("")
	50 ms	linux > LLVM.CodeGen/WebAssembly::Unknown Unit Message ("")
	960 ms	windows > LLVM.CodeGen/AMDGPU/GlobalISel::Unknown Unit Message ("")
	30 ms	windows > LLVM.CodeGen/WebAssembly::Unknown Unit Message ("")

Event Timeline

rochauha created this revision.Jul 20 2020, 11:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2020, 11:58 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 8 others. · View Herald Transcript

Harbormaster failed remote builds in B64971: Diff 279319!Jul 20 2020, 10:46 PM

Needs test

This revision now requires changes to proceed.Jul 21 2020, 7:13 AM

foad added a subscriber: foad.Jul 21 2020, 7:54 AM

foad added inline comments.

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
439–441	Why have you changed this?

rochauha marked an inline comment as done.Jul 21 2020, 8:28 AM

rochauha added inline comments.

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
439–441	To follow the computation of `GRANULATED_WAVEFRONT_SGPR_COUNT` for GFX9, as mentioned in https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-compute-pgm-rsrc1-gfx6-gfx10-table

rochauha marked an inline comment as not done.Jul 21 2020, 8:29 AM

In D84194#2164227, @arsenm wrote:

Needs test

I think these changes are tested using the test in https://reviews.llvm.org/D80713.
In fact the issue was found when testing round tripping for the above patch. I guess this can only be verified by a round trip test when we assemble->disassemble->re-assemble? Such a test is already present in the patch for D80713.

I am not sure how else can we look at the value of GRANULATED_WAVEFRONT_SGPR_COUNT in a test case.

In D84194#2164406, @rochauha wrote:

In D84194#2164227, @arsenm wrote:

Needs test

I think these changes are tested using the test in https://reviews.llvm.org/D80713.
In fact the issue was found when testing round tripping for the above patch. I guess this can only be verified by a round trip test when we assemble->disassemble->re-assemble? Such a test is already present in the patch for D80713.

I am not sure how else can we look at the value of GRANULATED_WAVEFRONT_SGPR_COUNT in a test case.

I think you should be able to test this by adding another KD case to llvm/test/MC/AMDGPU/hsa-v3.s, and just checking the hexdump of the KD as for the other cases there. It should be pretty painless, you can just copy-paste the minimal one, set the SGPR count to trigger the bug, and update the GRANULATED_WAVEFRONT_SGPR_COUNT bits in the expected dump.

scott.linder added inline comments.Jul 21 2020, 3:08 PM

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
348	I don't know if this is actually accurate, I think the reason for the "2 *" in the equation for GFX9 is not because the allocation granule is 16. It is still 8 for gfx9, but there is an additional constraint that you must allocate an even number of granules. It is a bit confusing, and I would like @kzhuravl to weigh in as IIRC he was who originally helped me understand this when we were updating the assembler.
439–440	If the above is true, and the granule for gfx9 is in fact 8, then I would just move all of the handling of the "even" requirement into this function, i.e. change this to: unsigned NumSGPRBlocks = NumSGPRs / (isGFX9(STI) ? 2 getSGPREncodingGranule(STI) : getSGPREncodingGranule(STI)) - 1;

foad added inline comments.Jul 22 2020, 12:32 AM

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
439–440	The current patch does have the advantage that it closely matches the documentation that Ronak pointed to. Though I suppose we could update the documentation too.

foad added inline comments.Jul 22 2020, 12:34 AM

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
434–440	Incidentally the alignTo and the division could be combined into a single call to divideCeil.

t-tye added inline comments.Jul 22 2020, 12:59 AM

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
348	For GFX9 the granularity is as specified in AMDGPUUsage which is 8. As @scott.linder mentions SPI rounds up to an even number of 8-granules. From the hardware spec: Number of SGPRS, granularity 8. SPI rounds up reg setting and allocs gran16. Range is from 0-13 allocating (SGPRS/2+1)*16: 16,16,32,32 ... 112,112

rochauha edited the summary of this revision. (Show Details)Jul 22 2020, 11:45 AM

Updated patch based on comments.
Updated old tests.
Added new test.

rochauha marked 3 inline comments as done.Jul 22 2020, 11:59 AM

rochauha added inline comments.

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
434–440	Done.

Harbormaster failed remote builds in B65271: Diff 279902!Jul 22 2020, 12:28 PM

foad added inline comments.Jul 23 2020, 1:20 AM

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
439–440	Don't you still need a std::max somewhere in here to cope with the NumSGPRs==0 case?

Added missing std::max.

rochauha marked 2 inline comments as done.Jul 23 2020, 1:55 AM

rochauha added inline comments.

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
439–440	Done. Thanks!

Harbormaster failed remote builds in B65344: Diff 280046!Jul 23 2020, 2:25 AM

I discussed with Tony today, and I was thinking about this the wrong way.

SPI does not require the granule count to be even, it just rounds up the granule count before actually performing the allocation. This means, from the compiler's perspective, when it is calculating things like the AMDGPU::IsaInfo::getMaxNumSGPRs it must consider the "allocation" granule size (IsaInfo::getSGPRAllocGranule). Conversely, from the assembler/diassembler perspective, it must consider the "encoding" granule size (IsaInfo::getSGPREncodingGranule). It is perfectly OK to have a GFX9 code object with a granulated SGPR count of 1, and we should allow emitting that in the assembler so that the disassembler can accurately reproduce those code objects.

I don't think there is any fix needed here, we already separate these two concepts and correctly apply them elsewhere. I think I just led you astray in the disassembly patch; you should only be using the encoding granule size, and shouldn't need any special handling for e.g. GFX9 to handle the fact that the allocation and encoding granule sizes are not equal.

In D84194#2170882, @scott.linder wrote:

I discussed with Tony today, and I was thinking about this the wrong way.

SPI does not require the granule count to be even, it just rounds up the granule count before actually performing the allocation. This means, from the compiler's perspective, when it is calculating things like the AMDGPU::IsaInfo::getMaxNumSGPRs it must consider the "allocation" granule size (IsaInfo::getSGPRAllocGranule). Conversely, from the assembler/diassembler perspective, it must consider the "encoding" granule size (IsaInfo::getSGPREncodingGranule). It is perfectly OK to have a GFX9 code object with a granulated SGPR count of 1, and we should allow emitting that in the assembler so that the disassembler can accurately reproduce those code objects.

I don't think there is any fix needed here, we already separate these two concepts and correctly apply them elsewhere. I think I just led you astray in the disassembly patch; you should only be using the encoding granule size, and shouldn't need any special handling for e.g. GFX9 to handle the fact that the allocation and encoding granule sizes are not equal.

Correct me if I'm wrong. So we must not take inverse of the mentioned GFX9 calculation (the one where we divide by 16 before roundup) as it is for allocation granule size? And hence the disassembly computation will be same for GFX6-8 and GFX9 (because the encoding granule size is the same)?

In D84194#2173959, @rochauha wrote:

In D84194#2170882, @scott.linder wrote:

I discussed with Tony today, and I was thinking about this the wrong way.

SPI does not require the granule count to be even, it just rounds up the granule count before actually performing the allocation. This means, from the compiler's perspective, when it is calculating things like the AMDGPU::IsaInfo::getMaxNumSGPRs it must consider the "allocation" granule size (IsaInfo::getSGPRAllocGranule). Conversely, from the assembler/diassembler perspective, it must consider the "encoding" granule size (IsaInfo::getSGPREncodingGranule). It is perfectly OK to have a GFX9 code object with a granulated SGPR count of 1, and we should allow emitting that in the assembler so that the disassembler can accurately reproduce those code objects.

I don't think there is any fix needed here, we already separate these two concepts and correctly apply them elsewhere. I think I just led you astray in the disassembly patch; you should only be using the encoding granule size, and shouldn't need any special handling for e.g. GFX9 to handle the fact that the allocation and encoding granule sizes are not equal.

Correct me if I'm wrong. So we must not take inverse of the mentioned GFX9 calculation (the one where we divide by 16 before roundup) as it is for allocation granule size? And hence the disassembly computation will be same for GFX6-8 and GFX9 (because the encoding granule size is the same)?

Correct, you can treat all hardware the same and calculate:

NumSGPRs = (NumSGPRBlocks + 1) * getSGPREncodingGranule()

I still think it might be good to make this into a function in AMDGPU::IsaInfo to be the inverse of getNumSGPRBlocks

Based on comments and discussion, the difference for GFX9 is being handled using allocation granule sizes and no change is required.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

Utils/

AMDGPUBaseInfo.cpp

8 lines

test/

MC/

AMDGPU/

hsa-v3.s

32 lines

Diff 280046

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

Show First 20 Lines • Show All 339 Lines • ▼ Show 20 Lines	if (Version.Major >= 10)
return getAddressableNumSGPRs(STI);		return getAddressableNumSGPRs(STI);
if (Version.Major >= 8)		if (Version.Major >= 8)
return 16;		return 16;
return 8;		return 8;
}		}

unsigned getSGPREncodingGranule(const MCSubtargetInfo *STI) {		unsigned getSGPREncodingGranule(const MCSubtargetInfo *STI) {
return 8;		return 8;
}		}
		scott.linderUnsubmitted Done Reply Inline Actions I don't know if this is actually accurate, I think the reason for the "2 " in the equation for GFX9 is not because the allocation granule is 16. It is still 8 for gfx9, but there is an additional constraint that you must allocate an even number of granules. It is a bit confusing, and I would like @kzhuravl to weigh in as IIRC he was who originally helped me understand this when we were updating the assembler. scott.linder:* I don't know if this is actually accurate, I think the reason for the "2 *" in the equation for…
		t-tyeUnsubmitted Not Done Reply Inline Actions For GFX9 the granularity is as specified in AMDGPUUsage which is 8. As @scott.linder mentions SPI rounds up to an even number of 8-granules. From the hardware spec: Number of SGPRS, granularity 8. SPI rounds up reg setting and allocs gran16. Range is from 0-13 allocating (SGPRS/2+1)16: 16,16,32,32 ... 112,112 t-tye:* For GFX9 the granularity is as specified in AMDGPUUsage which is 8. As @scott.linder mentions…

unsigned getTotalNumSGPRs(const MCSubtargetInfo *STI) {		unsigned getTotalNumSGPRs(const MCSubtargetInfo *STI) {
IsaVersion Version = getIsaVersion(STI->getCPU());		IsaVersion Version = getIsaVersion(STI->getCPU());
if (Version.Major >= 8)		if (Version.Major >= 8)
return 800;		return 800;
return 512;		return 512;
}		}

▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines

unsigned getNumExtraSGPRs(const MCSubtargetInfo *STI, bool VCCUsed,		unsigned getNumExtraSGPRs(const MCSubtargetInfo *STI, bool VCCUsed,
bool FlatScrUsed) {		bool FlatScrUsed) {
return getNumExtraSGPRs(STI, VCCUsed, FlatScrUsed,		return getNumExtraSGPRs(STI, VCCUsed, FlatScrUsed,
STI->getFeatureBits().test(AMDGPU::FeatureXNACK));		STI->getFeatureBits().test(AMDGPU::FeatureXNACK));
}		}

unsigned getNumSGPRBlocks(const MCSubtargetInfo *STI, unsigned NumSGPRs) {		unsigned getNumSGPRBlocks(const MCSubtargetInfo *STI, unsigned NumSGPRs) {
NumSGPRs = alignTo(std::max(1u, NumSGPRs), getSGPREncodingGranule(STI));		// Even though granularity is 8, the roundup must be an even number of
		// 8-granules for GFX9.
		unsigned Alignment = isGFX9(STI) ? getSGPREncodingGranule(STI) 2
		: getSGPREncodingGranule(STI);
// SGPRBlocks is actual number of SGPR blocks minus 1.		// SGPRBlocks is actual number of SGPR blocks minus 1.
return NumSGPRs / getSGPREncodingGranule(STI) - 1;		unsigned NumSGPRBlocks = divideCeil(std::max(1u, NumSGPRs), Alignment) - 1;
		return isGFX9(STI) ? NumSGPRBlocks 2 : NumSGPRBlocks;
		scott.linderUnsubmitted Not Done Reply Inline Actions If the above is true, and the granule for gfx9 is in fact 8, then I would just move all of the handling of the "even" requirement into this function, i.e. change this to: unsigned NumSGPRBlocks = NumSGPRs / (isGFX9(STI) ? 2 getSGPREncodingGranule(STI) : getSGPREncodingGranule(STI)) - 1; scott.linder: If the above is true, and the granule for gfx9 is in fact 8, then I would just move all of the…
		foadUnsubmitted Not Done Reply Inline Actions The current patch does have the advantage that it closely matches the documentation that Ronak pointed to. Though I suppose we could update the documentation too. foad: The current patch does have the advantage that it closely matches the documentation that Ronak…
		foadUnsubmitted Done Reply Inline Actions Incidentally the alignTo and the division could be combined into a single call to divideCeil. foad: Incidentally the alignTo and the division could be combined into a single call to divideCeil.
		rochauhaAuthorUnsubmitted Done Reply Inline Actions Done. rochauha: Done.
		foadUnsubmitted Done Reply Inline Actions Don't you still need a std::max somewhere in here to cope with the NumSGPRs==0 case? foad: Don't you still need a std::max somewhere in here to cope with the NumSGPRs==0 case?
		rochauhaAuthorUnsubmitted Done Reply Inline Actions Done. Thanks! rochauha: Done. Thanks!
}		}
		foadUnsubmitted Not Done Reply Inline Actions Why have you changed this? foad: Why have you changed this?
		rochauhaAuthorUnsubmitted Not Done Reply Inline Actions To follow the computation of `GRANULATED_WAVEFRONT_SGPR_COUNT` for GFX9, as mentioned in https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-compute-pgm-rsrc1-gfx6-gfx10-table rochauha: To follow the computation of `GRANULATED_WAVEFRONT_SGPR_COUNT` for GFX9, as mentioned in https…

unsigned getVGPRAllocGranule(const MCSubtargetInfo *STI,		unsigned getVGPRAllocGranule(const MCSubtargetInfo *STI,
Optional<bool> EnableWavefrontSize32) {		Optional<bool> EnableWavefrontSize32) {
bool IsWave32 = EnableWavefrontSize32 ?		bool IsWave32 = EnableWavefrontSize32 ?
*EnableWavefrontSize32 :		*EnableWavefrontSize32 :
STI->getFeatureBits().test(FeatureWavefrontSize32);		STI->getFeatureBits().test(FeatureWavefrontSize32);

if (hasGFX10_3Insts(*STI))		if (hasGFX10_3Insts(*STI))
▲ Show 20 Lines • Show All 1,058 Lines • Show Last 20 Lines

llvm/test/MC/AMDGPU/hsa-v3.s

// RUN: llvm-mc -mattr=+code-object-v3 -triple amdgcn-amd-amdhsa -mcpu=gfx904 -mattr=+xnack < %s \| FileCheck --check-prefix=ASM %s		// RUN: llvm-mc -mattr=+code-object-v3 -triple amdgcn-amd-amdhsa -mcpu=gfx904 -mattr=+xnack < %s \| FileCheck --check-prefix=ASM %s
// RUN: llvm-mc -mattr=+code-object-v3 -triple amdgcn-amd-amdhsa -mcpu=gfx904 -mattr=+xnack -filetype=obj < %s > %t		// RUN: llvm-mc -mattr=+code-object-v3 -triple amdgcn-amd-amdhsa -mcpu=gfx904 -mattr=+xnack -filetype=obj < %s > %t
// RUN: llvm-readelf -sections -symbols -relocations %t \| FileCheck --check-prefix=READOBJ %s		// RUN: llvm-readelf -sections -symbols -relocations %t \| FileCheck --check-prefix=READOBJ %s
// RUN: llvm-objdump -s -j .rodata %t \| FileCheck --check-prefix=OBJDUMP %s		// RUN: llvm-objdump -s -j .rodata %t \| FileCheck --check-prefix=OBJDUMP %s

// big endian not supported		// big endian not supported
// XFAIL: host-byteorder-big-endian		// XFAIL: host-byteorder-big-endian

// READOBJ: Section Headers		// READOBJ: Section Headers
// READOBJ: .text PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256		// READOBJ: .text PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
// READOBJ: .rodata PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000100 {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64		// READOBJ: .rodata PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000140 {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64

// READOBJ: Relocation section '.rela.rodata' at offset		// READOBJ: Relocation section '.rela.rodata' at offset
// READOBJ: 0000000000000010 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 10		// READOBJ: 0000000000000010 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 10
// READOBJ: 0000000000000050 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 110		// READOBJ: 0000000000000050 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 110
// READOBJ: 0000000000000090 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 210		// READOBJ: 0000000000000090 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 210
// READOBJ: 00000000000000d0 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 310		// READOBJ: 00000000000000d0 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 310
		// READOBJ: 0000000000000110 {{[0-9a-f]+}}00000005 R_AMDGPU_REL64 0000000000000000 .text + 410

// READOBJ: Symbol table '.symtab' contains {{[0-9]+}} entries:		// READOBJ: Symbol table '.symtab' contains {{[0-9]+}} entries:
// READOBJ: {{[0-9]+}}: 0000000000000100 0 FUNC LOCAL PROTECTED 2 complete		// READOBJ: {{[0-9]+}}: 0000000000000100 0 FUNC LOCAL PROTECTED 2 complete
// READOBJ: {{[0-9]+}}: 0000000000000040 64 OBJECT LOCAL DEFAULT 3 complete.kd		// READOBJ: {{[0-9]+}}: 0000000000000040 64 OBJECT LOCAL DEFAULT 3 complete.kd
// READOBJ: {{[0-9]+}}: 0000000000000300 0 FUNC LOCAL PROTECTED 2 disabled_user_sgpr		// READOBJ: {{[0-9]+}}: 0000000000000300 0 FUNC LOCAL PROTECTED 2 disabled_user_sgpr
// READOBJ: {{[0-9]+}}: 00000000000000c0 64 OBJECT LOCAL DEFAULT 3 disabled_user_sgpr.kd		// READOBJ: {{[0-9]+}}: 00000000000000c0 64 OBJECT LOCAL DEFAULT 3 disabled_user_sgpr.kd
		// READOBJ: {{[0-9]+}}: 0000000000000400 0 FUNC LOCAL PROTECTED 2 gfx9_sgpr
		// READOBJ: {{[0-9]+}}: 0000000000000100 64 OBJECT LOCAL DEFAULT 3 gfx9_sgpr.kd
// READOBJ: {{[0-9]+}}: 0000000000000000 0 FUNC LOCAL PROTECTED 2 minimal		// READOBJ: {{[0-9]+}}: 0000000000000000 0 FUNC LOCAL PROTECTED 2 minimal
// READOBJ: {{[0-9]+}}: 0000000000000000 64 OBJECT LOCAL DEFAULT 3 minimal.kd		// READOBJ: {{[0-9]+}}: 0000000000000000 64 OBJECT LOCAL DEFAULT 3 minimal.kd
// READOBJ: {{[0-9]+}}: 0000000000000200 0 FUNC LOCAL PROTECTED 2 special_sgpr		// READOBJ: {{[0-9]+}}: 0000000000000200 0 FUNC LOCAL PROTECTED 2 special_sgpr
// READOBJ: {{[0-9]+}}: 0000000000000080 64 OBJECT LOCAL DEFAULT 3 special_sgpr.kd		// READOBJ: {{[0-9]+}}: 0000000000000080 64 OBJECT LOCAL DEFAULT 3 special_sgpr.kd

// OBJDUMP: Contents of section .rodata		// OBJDUMP: Contents of section .rodata
// Note, relocation for KERNEL_CODE_ENTRY_BYTE_OFFSET is not resolved here.		// Note, relocation for KERNEL_CODE_ENTRY_BYTE_OFFSET is not resolved here.
// minimal		// minimal
// OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000		// OBJDUMP-NEXT: 0000 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000		// OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000		// OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 0030 0000ac00 80000000 00000000 00000000		// OBJDUMP-NEXT: 0030 0000ac00 80000000 00000000 00000000
// complete		// complete
// OBJDUMP-NEXT: 0040 01000000 01000000 00000000 00000000		// OBJDUMP-NEXT: 0040 01000000 01000000 00000000 00000000
// OBJDUMP-NEXT: 0050 00000000 00000000 00000000 00000000		// OBJDUMP-NEXT: 0050 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 0060 00000000 00000000 00000000 00000000		// OBJDUMP-NEXT: 0060 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 0070 c2500104 1f0f007f 7f000000 00000000		// OBJDUMP-NEXT: 0070 82500104 1f0f007f 7f000000 00000000
// special_sgpr		// special_sgpr
// OBJDUMP-NEXT: 0080 00000000 00000000 00000000 00000000		// OBJDUMP-NEXT: 0080 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 0090 00000000 00000000 00000000 00000000		// OBJDUMP-NEXT: 0090 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 00a0 00000000 00000000 00000000 00000000		// OBJDUMP-NEXT: 00a0 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 00b0 00010000 80000000 00000000 00000000		// OBJDUMP-NEXT: 00b0 00010000 80000000 00000000 00000000
// disabled_user_sgpr		// disabled_user_sgpr
// OBJDUMP-NEXT: 00c0 00000000 00000000 00000000 00000000		// OBJDUMP-NEXT: 00c0 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 00d0 00000000 00000000 00000000 00000000		// OBJDUMP-NEXT: 00d0 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 00e0 00000000 00000000 00000000 00000000		// OBJDUMP-NEXT: 00e0 00000000 00000000 00000000 00000000
// OBJDUMP-NEXT: 00f0 0000ac00 80000000 00000000 00000000		// OBJDUMP-NEXT: 00f0 0000ac00 80000000 00000000 00000000
		// gfx9_sgpr
		// OBJDUMP-NEXT: 0100 00000000 00000000 00000000 00000000
		// OBJDUMP-NEXT: 0110 00000000 00000000 00000000 00000000
		// OBJDUMP-NEXT: 0120 00000000 00000000 00000000 00000000
		// OBJDUMP-NEXT: 0130 0001ac00 80000000 00000000 00000000


.text		.text
// ASM: .text		// ASM: .text

.amdgcn_target "amdgcn-amd-amdhsa--gfx904+xnack"		.amdgcn_target "amdgcn-amd-amdhsa--gfx904+xnack"
// ASM: .amdgcn_target "amdgcn-amd-amdhsa--gfx904+xnack"		// ASM: .amdgcn_target "amdgcn-amd-amdhsa--gfx904+xnack"

.p2align 8		.p2align 8
Show All 11 Lines
special_sgpr:		special_sgpr:
s_endpgm		s_endpgm

.p2align 8		.p2align 8
.type disabled_user_sgpr,@function		.type disabled_user_sgpr,@function
disabled_user_sgpr:		disabled_user_sgpr:
s_endpgm		s_endpgm

		.p2align 8
		.type gfx9_sgpr,@function
		gfx9_sgpr:
		s_endpgm


.rodata		.rodata
// ASM: .rodata		// ASM: .rodata

// Test that only specifying required directives is allowed, and that defaulted		// Test that only specifying required directives is allowed, and that defaulted
// values are omitted.		// values are omitted.
.p2align 6		.p2align 6
.amdhsa_kernel minimal		.amdhsa_kernel minimal
.amdhsa_next_free_vgpr 0		.amdhsa_next_free_vgpr 0
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	.amdhsa_kernel disabled_user_sgpr
.amdhsa_next_free_sgpr 0		.amdhsa_next_free_sgpr 0
.end_amdhsa_kernel		.end_amdhsa_kernel

// ASM: .amdhsa_kernel disabled_user_sgpr		// ASM: .amdhsa_kernel disabled_user_sgpr
// ASM: .amdhsa_next_free_vgpr 0		// ASM: .amdhsa_next_free_vgpr 0
// ASM-NEXT: .amdhsa_next_free_sgpr 0		// ASM-NEXT: .amdhsa_next_free_sgpr 0
// ASM: .end_amdhsa_kernel		// ASM: .end_amdhsa_kernel

		// Test GRANULATED_WAVEFRONT_SGPR_COUNT for GFX9
		.p2align 6
		.amdhsa_kernel gfx9_sgpr
		.amdhsa_next_free_vgpr 0
		.amdhsa_next_free_sgpr 33
		.end_amdhsa_kernel

		// ASM: .amdhsa_kernel gfx9_sgpr
		// ASM: .amdhsa_next_free_vgpr 0
		// ASM-NEXT: .amdhsa_next_free_sgpr 33
		// ASM: .end_amdhsa_kernel


.section .foo		.section .foo

.byte .amdgcn.gfx_generation_number		.byte .amdgcn.gfx_generation_number
// ASM: .byte 9		// ASM: .byte 9

.byte .amdgcn.gfx_generation_minor		.byte .amdgcn.gfx_generation_minor
// ASM: .byte 0		// ASM: .byte 0

▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Correct the number of SGPR blocks used for GFX9AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 280046

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

llvm/test/MC/AMDGPU/hsa-v3.s

[AMDGPU] Correct the number of SGPR blocks used for GFX9
AbandonedPublic