This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
737	Your change is definitely a great improvement, a lot of sp3-based MIMG tests now pass. However I'm not sure if a16 should affect size of gradients. gfx10_shader_programming only says that gradients are packed for g16 opcodes. sp3 code does not pack gradients for a16=1 either. When I remove this condition from your patch, I see some improvements in test pass rate for _d and _cd opcodes. Below are a few tests which fail with your patch but pass if IsA16 condition is removed: image_sample_cd v[5:6], v[1:8], s[8:15], s[12:15] dmask:0x3 dim:SQ_RSRC_IMG_2D a16 image_sample_cd v[5:6], v[1:8], s[8:15], s[12:15] dmask:0x3 dim:SQ_RSRC_IMG_CUBE a16 image_sample_cd v[5:6], v[1:8], s[8:15], s[12:15] dmask:0x3 dim:SQ_RSRC_IMG_2D_ARRAY a16

dp added inline comments.May 4 2021, 7:47 AM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
737	My previous comment was intended for your change in parser, sorry. https://reviews.llvm.org/D101619

sebastian-ne added a subscriber: sebastian-ne.May 4 2021, 8:50 AM

sebastian-ne added inline comments.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
737	The code in SIISelLowering suggests that A16 always needs G16 if derivatives are specified explicitely: https://github.com/llvm/llvm-project/blob/8e211bf1c895a31b3e9f49014b5494d8e1dabcf6/llvm/lib/Target/AMDGPU/SIISelLowering.cpp#L6098-L6103 I remember something like A16 implies G16, but I don’t remember where that comes from. IIRC sp3 often shows larger registers than are actually used. LLVM is a lot stricter there.

dp added inline comments.May 4 2021, 10:38 AM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
737	IIRC sp3 often shows larger registers than are actually used. LLVM is a lot stricter there. It is not always true. Actually llvm may align MIMG address size to 8/16: https://github.com/llvm/llvm-project/blob/e1c729c56829d3b9502b9ac2439003f87231db50/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp#L3436 I remember something like A16 implies G16, but I don’t remember where that comes from. Is this a feature of our compiler or AMD H/W? I have just checked that the latest sp3 do distinguish g16 and a16. Below are some examples (valid sp3 code): image_sample_d_g16 v[0:3], [v0, v2, v4, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D image_sample_d_g16 v[0:3], [v0, v2, v4], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16 image_sample_d v[0:3], [v0, v2, v4, v6, v8, v9], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D image_sample_d v[0:3], [v0, v2, v4, v6, v8], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16

dstuttard added inline comments.May 5 2021, 1:11 AM

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
737	I've done some more digging and I can confirm that the A16 should not be used for gradient packing. I'll update the code in all locations and re-submit.

dstuttard mentioned this in D102066: [AMDGPU] Fix codegen of image intrinsics for g16 and a16.May 7 2021, 4:43 AM

I've combined this with some other changes into D102231 - easier than trying to separate all the various bits

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

Disassembler/

AMDGPUDisassembler.cpp

24 lines

test/

MC/

Disassembler/

AMDGPU/

mimg_gfx10.txt

109 lines

Diff 341861

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

Show First 20 Lines • Show All 709 Lines • ▼ Show 20 Lines	DecodeStatus AMDGPUDisassembler::convertMIMGInst(MCInst &MI) const {
bool IsGather4 = MCII->get(MI.getOpcode()).TSFlags & SIInstrFlags::Gather4;		bool IsGather4 = MCII->get(MI.getOpcode()).TSFlags & SIInstrFlags::Gather4;

bool IsNSA = false;		bool IsNSA = false;
unsigned AddrSize = Info->VAddrDwords;		unsigned AddrSize = Info->VAddrDwords;

if (STI.getFeatureBits()[AMDGPU::FeatureGFX10]) {		if (STI.getFeatureBits()[AMDGPU::FeatureGFX10]) {
unsigned DimIdx =		unsigned DimIdx =
AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::dim);		AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::dim);
		int A16Idx =
		AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::a16);
const AMDGPU::MIMGBaseOpcodeInfo *BaseOpcode =		const AMDGPU::MIMGBaseOpcodeInfo *BaseOpcode =
AMDGPU::getMIMGBaseOpcodeInfo(Info->BaseOpcode);		AMDGPU::getMIMGBaseOpcodeInfo(Info->BaseOpcode);
const AMDGPU::MIMGDimInfo *Dim =		const AMDGPU::MIMGDimInfo *Dim =
AMDGPU::getMIMGDimInfoByEncoding(MI.getOperand(DimIdx).getImm());		AMDGPU::getMIMGDimInfoByEncoding(MI.getOperand(DimIdx).getImm());
		const bool IsA16 = (A16Idx != -1 && MI.getOperand(A16Idx).getImm());

AddrSize = BaseOpcode->NumExtraArgs +		// This mimics the calculation of AddrSize in
(BaseOpcode->Gradients ? Dim->NumGradients : 0) +		// SIInstrInfo::verifyInstruction.
(BaseOpcode->Coordinates ? Dim->NumCoords : 0) +		AddrSize = BaseOpcode->NumExtraArgs;
		unsigned AddrComponents = (BaseOpcode->Coordinates ? Dim->NumCoords : 0) +
(BaseOpcode->LodOrClampOrMip ? 1 : 0);		(BaseOpcode->LodOrClampOrMip ? 1 : 0);
		if (IsA16)
		AddrComponents = divideCeil(AddrComponents, 2);

		AddrSize += AddrComponents;

		if (BaseOpcode->Gradients) {
		if (IsA16 \|\| BaseOpcode->G16)
		dpUnsubmitted Not Done Reply Inline Actions Your change is definitely a great improvement, a lot of sp3-based MIMG tests now pass. However I'm not sure if a16 should affect size of gradients. gfx10_shader_programming only says that gradients are packed for g16 opcodes. sp3 code does not pack gradients for a16=1 either. When I remove this condition from your patch, I see some improvements in test pass rate for _d and _cd opcodes. Below are a few tests which fail with your patch but pass if IsA16 condition is removed: image_sample_cd v[5:6], v[1:8], s[8:15], s[12:15] dmask:0x3 dim:SQ_RSRC_IMG_2D a16 image_sample_cd v[5:6], v[1:8], s[8:15], s[12:15] dmask:0x3 dim:SQ_RSRC_IMG_CUBE a16 image_sample_cd v[5:6], v[1:8], s[8:15], s[12:15] dmask:0x3 dim:SQ_RSRC_IMG_2D_ARRAY a16 dp: Your change is definitely a great improvement, a lot of sp3-based MIMG tests now pass. However…
		dpUnsubmitted Not Done Reply Inline Actions My previous comment was intended for your change in parser, sorry. https://reviews.llvm.org/D101619 dp: My previous comment was intended for your change in parser, sorry. https://reviews.llvm.
		sebastian-neUnsubmitted Not Done Reply Inline Actions The code in SIISelLowering suggests that A16 always needs G16 if derivatives are specified explicitely: https://github.com/llvm/llvm-project/blob/8e211bf1c895a31b3e9f49014b5494d8e1dabcf6/llvm/lib/Target/AMDGPU/SIISelLowering.cpp#L6098-L6103 I remember something like A16 implies G16, but I don’t remember where that comes from. IIRC sp3 often shows larger registers than are actually used. LLVM is a lot stricter there. sebastian-ne: The code in SIISelLowering suggests that A16 always needs G16 if derivatives are specified…
		dpUnsubmitted Not Done Reply Inline Actions IIRC sp3 often shows larger registers than are actually used. LLVM is a lot stricter there. It is not always true. Actually llvm may align MIMG address size to 8/16: https://github.com/llvm/llvm-project/blob/e1c729c56829d3b9502b9ac2439003f87231db50/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp#L3436 I remember something like A16 implies G16, but I don’t remember where that comes from. Is this a feature of our compiler or AMD H/W? I have just checked that the latest sp3 do distinguish g16 and a16. Below are some examples (valid sp3 code): image_sample_d_g16 v[0:3], [v0, v2, v4, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D image_sample_d_g16 v[0:3], [v0, v2, v4], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16 image_sample_d v[0:3], [v0, v2, v4, v6, v8, v9], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D image_sample_d v[0:3], [v0, v2, v4, v6, v8], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16 dp: > IIRC sp3 often shows larger registers than are actually used. LLVM is a lot stricter there.
		dstuttardAuthorUnsubmitted Done Reply Inline Actions I've done some more digging and I can confirm that the A16 should not be used for gradient packing. I'll update the code in all locations and re-submit. dstuttard: I've done some more digging and I can confirm that the A16 should not be used for gradient…
		AddrSize += alignTo<2>(Dim->NumGradients / 2);
		else
		AddrSize += Dim->NumGradients;
		}

IsNSA = Info->MIMGEncoding == AMDGPU::MIMGEncGfx10NSA;		IsNSA = Info->MIMGEncoding == AMDGPU::MIMGEncGfx10NSA;
if (!IsNSA) {		if (!IsNSA) {
if (AddrSize > 8)		if (AddrSize > 8)
AddrSize = 16;		AddrSize = 16;
else if (AddrSize > 4)		else if (AddrSize > 4)
AddrSize = 8;		AddrSize = 8;
} else {		} else {
if (AddrSize > Info->VAddrDwords) {		if (AddrSize > Info->VAddrDwords) {
▲ Show 20 Lines • Show All 1,138 Lines • Show Last 20 Lines

llvm/test/MC/Disassembler/AMDGPU/mimg_gfx10.txt

This file was added.

				# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1010 -disassemble -show-encoding < %s \| FileCheck %s -check-prefix=GFX10

				# GFX10: image_load v[4:6], v238, s[28:35] dmask:0x7 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x17,0x00,0xf0,0xee,0x04,0x07,0x00]
				0x00,0x17,0x00,0xf0,0xee,0x04,0x07,0x00

				# GFX10: image_load_pck v5, v0, s[8:15] dmask:0x1 dim:SQ_RSRC_IMG_1D glc ; encoding: [0x00,0x21,0x08,0xf0,0x00,0x05,0x02,0x00]
				0x00,0x21,0x08,0xf0,0x00,0x05,0x02,0x00

				# GFX10: image_load_pck_sgn v5, v0, s[8:15] dmask:0x1 dim:SQ_RSRC_IMG_1D lwe ; encoding: [0x00,0x01,0x0e,0xf0,0x00,0x05,0x02,0x00]
				0x00,0x01,0x0e,0xf0,0x00,0x05,0x02,0x00

				# GFX10: image_load_mip v5, v[0:1], s[8:15] dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x00,0x04,0xf0,0x00,0x05,0x02,0x00]
				0x00,0x00,0x04,0xf0,0x00,0x05,0x02,0x00

				# GFX10: image_load_mip_pck v5, v[1:2], s[8:15] dmask:0x1 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x01,0x10,0xf0,0x01,0x05,0x02,0x00]
				0x00,0x01,0x10,0xf0,0x01,0x05,0x02,0x00

				# GFX10: image_load_mip_pck_sgn v[4:5], v[0:1], s[8:15] dmask:0x5 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x05,0x14,0xf0,0x00,0x04,0x02,0x00]
				0x00,0x05,0x14,0xf0,0x00,0x04,0x02,0x00

				# GFX10: image_store v[192:194], v238, s[28:35] dmask:0x7 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x17,0x20,0xf0,0xee,0xc0,0x07,0x00]
				0x00,0x17,0x20,0xf0,0xee,0xc0,0x07,0x00

				# GFX10: image_store_pck v1, v2, s[12:19] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x11,0x28,0xf0,0x02,0x01,0x03,0x00]
				0x00,0x51,0x28,0xf0,0x02,0x01,0x03,0x00

				# GFX10: image_store_mip v1, v[2:3], s[12:19] dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x00,0x24,0xf0,0x02,0x01,0x03,0x00]
				0x00,0x00,0x24,0xf0,0x02,0x01,0x03,0x00

				# GFX10: image_store_mip_pck v252, v[2:3], s[12:19] dmask:0x1 dim:SQ_RSRC_IMG_1D r128 ; encoding: [0x00,0x81,0x2c,0xf0,0x02,0xfc,0x03,0x00]
				0x00,0x81,0x2c,0xf0,0x02,0xfc,0x03,0x00

				# GFX10: image_atomic_sub v4, v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm glc ; encoding: [0x00,0x31,0x48,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x31,0x48,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_and v4, v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x11,0x60,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x11,0x60,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_cmpswap v[4:5], v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm glc ; encoding: [0x00,0x31,0x40,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x31,0x40,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_add v[4:5], v192, s[28:35] dmask:0x3 dim:SQ_RSRC_IMG_1D unorm glc ; encoding: [0x00,0x33,0x44,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x33,0x44,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_or v4, v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x11,0x64,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x11,0x64,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_xor v4, v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x11,0x68,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x11,0x68,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_sub v4, v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x11,0x48,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x11,0x48,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_smin v4, v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x11,0x50,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x11,0x50,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_smax v4, v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x11,0x58,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x11,0x58,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_umin v4, v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x11,0x54,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x11,0x54,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_umax v4, v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x11,0x5c,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x11,0x5c,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_inc v4, v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x11,0x6c,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x11,0x6c,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_atomic_dec v4, v192, s[28:35] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x11,0x70,0xf0,0xc0,0x04,0x07,0x00]
				0x00,0x11,0x70,0xf0,0xc0,0x04,0x07,0x00

				# GFX10: image_get_resinfo v5, v1, s[8:15] dmask:0x1 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x01,0x38,0xf0,0x01,0x05,0x02,0x00]
				0x00,0x01,0x38,0xf0,0x01,0x05,0x02,0x00

				# GFX10: image_sample v5, v0, s[8:15], s[12:15] dmask:0x1 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x01,0x80,0xf0,0x00,0x05,0x62,0x00]
				0x00,0x01,0x80,0xf0,0x00,0x05,0x62,0x00

				# GFX10: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16 ; encoding: [0x08,0x1f,0x00,0xf0,0x00,0x00,0x00,0x40]
				0x08,0x1f,0x00,0xf0,0x00,0x00,0x00,0x40

				# GFX10: image_load v[0:4], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm tfe ; encoding: [0x08,0x1f,0x01,0xf0,0x00,0x00,0x00,0x00]
				0x08,0x1f,0x01,0xf0,0x00,0x00,0x00,0x00

				# GFX10: image_load v[0:4], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16 tfe ; encoding: [0x08,0x1f,0x01,0xf0,0x00,0x00,0x00,0x40]
				0x08,0x1f,0x01,0xf0,0x00,0x00,0x00,0x40

				# GFX10: image_load v1, v1, s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D a16 ; encoding: [0x08,0x01,0x00,0xf0,0x01,0x01,0x04,0x40]
				0x08,0x01,0x00,0xf0,0x01,0x01,0x04,0x40

				# GFX10: image_load v[1:2], v1, s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D a16 tfe ; encoding: [0x08,0x01,0x01,0xf0,0x01,0x01,0x04,0x40]
				0x08,0x01,0x01,0xf0,0x01,0x01,0x04,0x40

				# GFX10: image_load v1, v1, s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D a16 lwe ; encoding: [0x08,0x01,0x02,0xf0,0x01,0x01,0x04,0x40]
				0x08,0x01,0x02,0xf0,0x01,0x01,0x04,0x40

				# GFX10: image_load v[1:2], v1, s[16:23] dmask:0x1 dim:SQ_RSRC_IMG_2D a16 tfe lwe ; encoding: [0x08,0x01,0x03,0xf0,0x01,0x01,0x04,0x40]
				0x08,0x01,0x03,0xf0,0x01,0x01,0x04,0x40

				# GFX10: image_load v[1:2], v1, s[16:23] dmask:0x3 dim:SQ_RSRC_IMG_2D a16 ; encoding: [0x08,0x03,0x00,0xf0,0x01,0x01,0x04,0x40]
				0x08,0x03,0x00,0xf0,0x01,0x01,0x04,0x40

				# GFX10: image_load v[1:4], v1, s[16:23] dmask:0x7 dim:SQ_RSRC_IMG_2D a16 tfe ; encoding: [0x08,0x07,0x01,0xf0,0x01,0x01,0x04,0x40]
				0x08,0x07,0x01,0xf0,0x01,0x01,0x04,0x40

				# GFX10: image_load v[1:4], v1, s[16:23] dmask:0xf dim:SQ_RSRC_IMG_2D a16 lwe ; encoding: [0x08,0x0f,0x02,0xf0,0x01,0x01,0x04,0x40]
				0x08,0x0f,0x02,0xf0,0x01,0x01,0x04,0x40

				# GFX10: image_load v[1:3], v1, s[16:23] dmask:0x5 dim:SQ_RSRC_IMG_2D a16 tfe lwe ; encoding: [0x08,0x05,0x03,0xf0,0x01,0x01,0x04,0x40]
				0x08,0x05,0x03,0xf0,0x01,0x01,0x04,0x40

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][Disassembler] Adjust img instruction address field if a16 presentAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 341861

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

llvm/test/MC/Disassembler/AMDGPU/mimg_gfx10.txt

[AMDGPU][Disassembler] Adjust img instruction address field if a16 present
AbandonedPublic