This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsAMDGPU.td
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUInstrInfo.h
-
AMDGPUInstrInfo.cpp
-
AMDGPUSearchableTables.td
-
MIMGInstructions.td
-
SIISelLowering.h
3/10
SIISelLowering.cpp
-
Utils/
-
AMDGPUBaseInfo.h
-
AMDGPUBaseInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
llvm.amdgcn.image.getlod.dim.ll

Differential D48017

AMDGPU: Select MIMG instructions manually in SITargetLowering
ClosedPublic

Authored by nhaehnle on Jun 11 2018, 5:50 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
rtaylor
tstellar

Commits

rG7a9c03f484fe: AMDGPU: Select MIMG instructions manually in SITargetLowering
rL335228: AMDGPU: Select MIMG instructions manually in SITargetLowering

Summary

Having TableGen patterns for image intrinsics is hitting limitations:
for D16 we already have to manually pre-lower the packing of data
values, and we will have to do the same for A16 eventually.

Since there is already some custom C++ code anyway, it is arguably easier
to just do everything in C++, now that we can use the beefed-up generic
tables backend of TableGen to provide all the required metadata and map
intrinsics to corresponding opcodes. With this approach, all image
intrinsic lowering happens in SITargetLowering::lowerImage. That code is
dense due to all the cases that it handles, but it should still be easier
to follow than what we had before, by virtue of it all being done in a
single location, and by virtue of not relying on the TableGen pattern
magic that very few people really understand.

This means that we will have MachineSDNodes with MIMG instructions
during DAG combining, but that seems alright: previously we had
intrinsic nodes instead, but those are similarly opaque to the generic
CodeGen infrastructure, and the final pattern matching just did a 1:1
translation to machine instructions anyway. If anything, the fact that
we now merge the address words into a vector before DAG combine should
be an advantage.

Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6

Diff Detail

Repository

rL LLVM

Build Status

Buildable 19462
Build 19462: arc lint + arc unit

Event Timeline

nhaehnle created this revision.Jun 11 2018, 5:50 AM

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptJun 11 2018, 5:50 AM

Harbormaster completed remote builds in B19162: Diff 150726.Jun 11 2018, 5:50 AM

nhaehnle added a parent revision: D48016: AMDGPU: Refactor MIMG instruction TableGen using generic tables.Jun 11 2018, 5:50 AM

nhaehnle added a child revision: D48018: AMDGPU: Convert test cases to the dimension-aware intrinsics.

Thank you! I am in favor of this change even despite what we have discussed today. If we eventually want to create target ISD nodes it could be done on top of it and separately, but currently I see no such need.
A separate note: SIISelLowering.cpp has really overgrown, it is already about 8000 lines. Maybe we shall consider splitting it into separate pieces as a separate change. Like this stuff is something like SIImageLowering.cpp.

This revision is now accepted and ready to land.Jun 11 2018, 10:34 AM

arsenm added inline comments.Jun 11 2018, 12:43 PM

lib/Target/AMDGPU/SIISelLowering.cpp
4620–4621	.ScalarType() == s16
4622	I think we already have a hasD16 feature?
4642	I'm going to be committing that patch soon so might as well just wait for that
4665	Isn't this always the case?
4680–4681	i1?

Address review comments

nhaehnle added inline comments.Jun 14 2018, 5:37 AM

lib/Target/AMDGPU/SIISelLowering.cpp
4620–4621	Done.
4622	I don't see one. There's a hasUnpackedD16VMem(), but that just says whether `D16` is unpacked or packed (if it exists). FWIW, I'm not a fan of having explicit feature flags for features that are already clearly delineated by hardware generations, which is the case for `D16`.
4642	Sure, will do. There's a few more changes to come anyway, and I want to submit them all at once to reduce merge conflicts with our internal branches.
4665	No. getlod and getresinfo don't access memory and are INTRINSIC_WO_CHAIN.
4680–4681	Good point, changed. It doesn't seem like any downstream users care...

nhaehnle added a child revision: D48165: InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded.Jun 14 2018, 6:19 AM

Rebased.

Harbormaster completed remote builds in B19462: Diff 151872.Jun 19 2018, 2:21 AM

Closed by commit rL335228: AMDGPU: Select MIMG instructions manually in SITargetLowering (authored by nha). · Explain WhyJun 21 2018, 6:41 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

IR/

IntrinsicsAMDGPU.td

20 lines

lib/

Target/

AMDGPU/

AMDGPUInstrInfo.h

7 lines

AMDGPUInstrInfo.cpp

5 lines

AMDGPUSearchableTables.td

21 lines

MIMGInstructions.td

222 lines

SIISelLowering.h

2 lines

SIISelLowering.cpp

277 lines

Utils/

AMDGPUBaseInfo.h

32 lines

AMDGPUBaseInfo.cpp

9 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.image.getlod.dim.ll

34 lines

Diff 151872

include/llvm/IR/IntrinsicsAMDGPU.td

Show First 20 Lines • Show All 600 Lines • ▼ Show 20 Lines	!foldl([]<AMDGPUArg>, arglists, lhs, rhs,
lhs,		lhs,
arglistmatchshift<rhs,		arglistmatchshift<rhs,
!add(shift, !foldl(0, lhs, a, b,		!add(shift, !foldl(0, lhs, a, b,
!add(a, b.Type.isAny)))>.ret));		!add(a, b.Type.isAny)))>.ret));
}		}

// Represent texture/image types / dimensionality.		// Represent texture/image types / dimensionality.
class AMDGPUDimProps<string name, list<string> coord_names, list<string> slice_names> {		class AMDGPUDimProps<string name, list<string> coord_names, list<string> slice_names> {
		AMDGPUDimProps Dim = !cast<AMDGPUDimProps>(NAME);
string Name = name; // e.g. "2darraymsaa"		string Name = name; // e.g. "2darraymsaa"
bit DA = 0; // DA bit in MIMG encoding		bit DA = 0; // DA bit in MIMG encoding

list<AMDGPUArg> CoordSliceArgs =		list<AMDGPUArg> CoordSliceArgs =
makeArgList<!listconcat(coord_names, slice_names), llvm_anyfloat_ty>.ret;		makeArgList<!listconcat(coord_names, slice_names), llvm_anyfloat_ty>.ret;
list<AMDGPUArg> CoordSliceIntArgs =		list<AMDGPUArg> CoordSliceIntArgs =
makeArgList<!listconcat(coord_names, slice_names), llvm_anyint_ty>.ret;		makeArgList<!listconcat(coord_names, slice_names), llvm_anyint_ty>.ret;
list<AMDGPUArg> GradientArgs =		list<AMDGPUArg> GradientArgs =
makeArgList<!listconcat(!foreach(name, coord_names, "d" # name # "dh"),		makeArgList<!listconcat(!foreach(name, coord_names, "d" # name # "dh"),
!foreach(name, coord_names, "d" # name # "dv")),		!foreach(name, coord_names, "d" # name # "dv")),
llvm_anyfloat_ty>.ret;		llvm_anyfloat_ty>.ret;

		bits<8> NumCoords = !size(CoordSliceArgs);
		bits<8> NumGradients = !size(GradientArgs);
}		}

def AMDGPUDim1D : AMDGPUDimProps<"1d", ["s"], []>;		def AMDGPUDim1D : AMDGPUDimProps<"1d", ["s"], []>;
def AMDGPUDim2D : AMDGPUDimProps<"2d", ["s", "t"], []>;		def AMDGPUDim2D : AMDGPUDimProps<"2d", ["s", "t"], []>;
def AMDGPUDim3D : AMDGPUDimProps<"3d", ["s", "t", "r"], []>;		def AMDGPUDim3D : AMDGPUDimProps<"3d", ["s", "t", "r"], []>;
let DA = 1 in {		let DA = 1 in {
def AMDGPUDimCube : AMDGPUDimProps<"cube", ["s", "t"], ["face"]>;		def AMDGPUDimCube : AMDGPUDimProps<"cube", ["s", "t"], ["face"]>;
def AMDGPUDim1DArray : AMDGPUDimProps<"1darray", ["s"], ["slice"]>;		def AMDGPUDim1DArray : AMDGPUDimProps<"1darray", ["s"], ["slice"]>;
▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	defset list<AMDGPUImageDimIntrinsic> AMDGPUImageDimIntrinsics = {
}		}

foreach sample = AMDGPUSampleVariants in {		foreach sample = AMDGPUSampleVariants in {
defm int_amdgcn_image_sample # sample.LowerCaseMod :		defm int_amdgcn_image_sample # sample.LowerCaseMod :
AMDGPUImageDimSampleDims<"SAMPLE" # sample.UpperCaseMod, sample>;		AMDGPUImageDimSampleDims<"SAMPLE" # sample.UpperCaseMod, sample>;
}		}

defm int_amdgcn_image_getlod : AMDGPUImageDimSampleDims<"GET_LOD", AMDGPUSample, 1>;		defm int_amdgcn_image_getlod : AMDGPUImageDimSampleDims<"GET_LOD", AMDGPUSample, 1>;
}

//////////////////////////////////////////////////////////////////////////		//////////////////////////////////////////////////////////////////////////
// getresinfo intrinsics (separate due to D16)		// getresinfo intrinsics
//////////////////////////////////////////////////////////////////////////		//////////////////////////////////////////////////////////////////////////
defset list<AMDGPUImageDimIntrinsic> AMDGPUImageDimGetResInfoIntrinsics = {
foreach dim = AMDGPUDims.All in {		foreach dim = AMDGPUDims.All in {
def !strconcat("int_amdgcn_image_getresinfo_", dim.Name)		def !strconcat("int_amdgcn_image_getresinfo_", dim.Name)
: AMDGPUImageDimIntrinsic<AMDGPUDimGetResInfoProfile<dim>, [IntrNoMem], []>;		: AMDGPUImageDimIntrinsic<AMDGPUDimGetResInfoProfile<dim>, [IntrNoMem], []>;
}		}
}

//////////////////////////////////////////////////////////////////////////		//////////////////////////////////////////////////////////////////////////
// gather4 intrinsics		// gather4 intrinsics
//////////////////////////////////////////////////////////////////////////		//////////////////////////////////////////////////////////////////////////
defset list<AMDGPUImageDimIntrinsic> AMDGPUImageDimGatherIntrinsics = {
foreach sample = AMDGPUSampleVariantsNoGradients in {		foreach sample = AMDGPUSampleVariantsNoGradients in {
foreach dim = [AMDGPUDim2D, AMDGPUDimCube, AMDGPUDim2DArray] in {		foreach dim = [AMDGPUDim2D, AMDGPUDimCube, AMDGPUDim2DArray] in {
def int_amdgcn_image_gather4 # sample.LowerCaseMod # _ # dim.Name:		def int_amdgcn_image_gather4 # sample.LowerCaseMod # _ # dim.Name:
AMDGPUImageDimIntrinsic<		AMDGPUImageDimIntrinsic<
AMDGPUDimSampleProfile<"GATHER4" # sample.UpperCaseMod, dim, sample>,		AMDGPUDimSampleProfile<"GATHER4" # sample.UpperCaseMod, dim, sample>,
[IntrReadMem], [SDNPMemOperand]>;		[IntrReadMem], [SDNPMemOperand]>;
}		}
}		}
▲ Show 20 Lines • Show All 589 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUInstrInfo.h

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	const RsrcIntrinsic *lookupRsrcIntrinsic(unsigned Intr);			const RsrcIntrinsic *lookupRsrcIntrinsic(unsigned Intr);

	struct D16ImageDimIntrinsic {			struct D16ImageDimIntrinsic {
	unsigned Intr;			unsigned Intr;
	unsigned D16HelperIntr;			unsigned D16HelperIntr;
	};			};
	const D16ImageDimIntrinsic *lookupD16ImageDimIntrinsic(unsigned Intr);			const D16ImageDimIntrinsic *lookupD16ImageDimIntrinsic(unsigned Intr);

				struct ImageDimIntrinsicInfo {
				unsigned Intr;
				unsigned BaseOpcode;
				MIMGDim Dim;
				};
				const ImageDimIntrinsicInfo *getImageDimIntrinsicInfo(unsigned Intr);

	} // end AMDGPU namespace			} // end AMDGPU namespace
	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

lib/Target/AMDGPU/AMDGPUInstrInfo.cpp

	Show All 22 Lines

	using namespace llvm;			using namespace llvm;

	#define GET_INSTRINFO_CTOR_DTOR			#define GET_INSTRINFO_CTOR_DTOR
	#include "AMDGPUGenInstrInfo.inc"			#include "AMDGPUGenInstrInfo.inc"

	namespace llvm {			namespace llvm {
	namespace AMDGPU {			namespace AMDGPU {
	#define GET_RsrcIntrinsics_IMPL
	#include "AMDGPUGenSearchableTables.inc"

	#define GET_D16ImageDimIntrinsics_IMPL			#define GET_D16ImageDimIntrinsics_IMPL
				#define GET_ImageDimIntrinsicTable_IMPL
				#define GET_RsrcIntrinsics_IMPL
	#include "AMDGPUGenSearchableTables.inc"			#include "AMDGPUGenSearchableTables.inc"
	}			}
	}			}

	// Pin the vtable to this file.			// Pin the vtable to this file.
	void AMDGPUInstrInfo::anchor() {}			void AMDGPUInstrInfo::anchor() {}

	AMDGPUInstrInfo::AMDGPUInstrInfo(const AMDGPUSubtarget &ST)			AMDGPUInstrInfo::AMDGPUInstrInfo(const AMDGPUSubtarget &ST)
	▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUSearchableTables.td

Show All 22 Lines	def RsrcIntrinsics : GenericTable {

let PrimaryKey = ["Intr"];		let PrimaryKey = ["Intr"];
let PrimaryKeyName = "lookupRsrcIntrinsic";		let PrimaryKeyName = "lookupRsrcIntrinsic";
}		}

foreach intr = !listconcat(AMDGPUBufferIntrinsics,		foreach intr = !listconcat(AMDGPUBufferIntrinsics,
AMDGPUImageIntrinsics,		AMDGPUImageIntrinsics,
AMDGPUImageDimIntrinsics,		AMDGPUImageDimIntrinsics,
AMDGPUImageDimGatherIntrinsics,
AMDGPUImageDimGetResInfoIntrinsics,
AMDGPUImageDimAtomicIntrinsics) in {		AMDGPUImageDimAtomicIntrinsics) in {
def : RsrcIntrinsic<!cast<AMDGPURsrcIntrinsic>(intr)>;		def : RsrcIntrinsic<!cast<AMDGPURsrcIntrinsic>(intr)>;
}		}

class SourceOfDivergence<Intrinsic intr> {		class SourceOfDivergence<Intrinsic intr> {
Intrinsic Intr = intr;		Intrinsic Intr = intr;
}		}

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
def : SourceOfDivergence<int_amdgcn_buffer_atomic_or>;		def : SourceOfDivergence<int_amdgcn_buffer_atomic_or>;
def : SourceOfDivergence<int_amdgcn_buffer_atomic_xor>;		def : SourceOfDivergence<int_amdgcn_buffer_atomic_xor>;
def : SourceOfDivergence<int_amdgcn_buffer_atomic_cmpswap>;		def : SourceOfDivergence<int_amdgcn_buffer_atomic_cmpswap>;
def : SourceOfDivergence<int_amdgcn_ps_live>;		def : SourceOfDivergence<int_amdgcn_ps_live>;
def : SourceOfDivergence<int_amdgcn_ds_swizzle>;		def : SourceOfDivergence<int_amdgcn_ds_swizzle>;

foreach intr = AMDGPUImageDimAtomicIntrinsics in		foreach intr = AMDGPUImageDimAtomicIntrinsics in
def : SourceOfDivergence<intr>;		def : SourceOfDivergence<intr>;

class D16ImageDimIntrinsic<AMDGPUImageDimIntrinsic intr> {
Intrinsic Intr = intr;
code D16HelperIntr =
!cast<code>("AMDGPUIntrinsic::SI_image_d16helper_" # intr.P.OpMod # intr.P.Dim.Name);
}

def D16ImageDimIntrinsics : GenericTable {
let FilterClass = "D16ImageDimIntrinsic";
let Fields = ["Intr", "D16HelperIntr"];

let PrimaryKey = ["Intr"];
let PrimaryKeyName = "lookupD16ImageDimIntrinsic";
}

foreach intr = !listconcat(AMDGPUImageDimIntrinsics,
AMDGPUImageDimGatherIntrinsics) in {
def : D16ImageDimIntrinsic<intr>;
}

lib/Target/AMDGPU/MIMGInstructions.td

Show All 21 Lines
def MIMGEncoding : GenericEnum {		def MIMGEncoding : GenericEnum {
let FilterClass = "MIMGEncoding";		let FilterClass = "MIMGEncoding";
}		}

// Represent an ISA-level opcode, independent of the encoding and the		// Represent an ISA-level opcode, independent of the encoding and the
// vdata/vaddr size.		// vdata/vaddr size.
class MIMGBaseOpcode {		class MIMGBaseOpcode {
MIMGBaseOpcode BaseOpcode = !cast<MIMGBaseOpcode>(NAME);		MIMGBaseOpcode BaseOpcode = !cast<MIMGBaseOpcode>(NAME);
		bit Store = 0;
		bit Atomic = 0;
		bit AtomicX2 = 0; // (f)cmpswap
		bit Sampler = 0;
bits<8> NumExtraArgs = 0;		bits<8> NumExtraArgs = 0;
bit Gradients = 0;		bit Gradients = 0;
bit Coordinates = 1;		bit Coordinates = 1;
bit LodOrClampOrMip = 0;		bit LodOrClampOrMip = 0;
bit HasD16 = 0;		bit HasD16 = 0;
}		}

def MIMGBaseOpcode : GenericEnum {		def MIMGBaseOpcode : GenericEnum {
let FilterClass = "MIMGBaseOpcode";		let FilterClass = "MIMGBaseOpcode";
}		}

def MIMGBaseOpcodesTable : GenericTable {		def MIMGBaseOpcodesTable : GenericTable {
let FilterClass = "MIMGBaseOpcode";		let FilterClass = "MIMGBaseOpcode";
let CppTypeName = "MIMGBaseOpcodeInfo";		let CppTypeName = "MIMGBaseOpcodeInfo";
let Fields = ["BaseOpcode", "NumExtraArgs", "Gradients", "Coordinates",		let Fields = ["BaseOpcode", "Store", "Atomic", "AtomicX2", "Sampler",
"LodOrClampOrMip", "HasD16"];		"NumExtraArgs", "Gradients", "Coordinates", "LodOrClampOrMip",
		"HasD16"];
GenericEnum TypeOf_BaseOpcode = MIMGBaseOpcode;		GenericEnum TypeOf_BaseOpcode = MIMGBaseOpcode;

let PrimaryKey = ["BaseOpcode"];		let PrimaryKey = ["BaseOpcode"];
let PrimaryKeyName = "getMIMGBaseOpcodeInfo";		let PrimaryKeyName = "getMIMGBaseOpcodeInfo";
}		}

		def MIMGDim : GenericEnum {
		let FilterClass = "AMDGPUDimProps";
		}

		def MIMGDimInfoTable : GenericTable {
		let FilterClass = "AMDGPUDimProps";
		let CppTypeName = "MIMGDimInfo";
		let Fields = ["Dim", "NumCoords", "NumGradients", "DA"];
		GenericEnum TypeOf_Dim = MIMGDim;

		let PrimaryKey = ["Dim"];
		let PrimaryKeyName = "getMIMGDimInfo";
		}

class mimg <bits<7> si, bits<7> vi = si> {		class mimg <bits<7> si, bits<7> vi = si> {
field bits<7> SI = si;		field bits<7> SI = si;
field bits<7> VI = vi;		field bits<7> VI = vi;
}		}

class MIMG <dag outs, string dns = "">		class MIMG <dag outs, string dns = "">
: InstSI <outs, (ins), "", []> {		: InstSI <outs, (ins), "", []> {

▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	multiclass MIMG_Store_Addr_Helper <bits<7> op, string asm,
let VAddrDwords = 3 in		let VAddrDwords = 3 in
def NAME # _V3 : MIMG_Store_Helper <op, asm, data_rc, VReg_96>;		def NAME # _V3 : MIMG_Store_Helper <op, asm, data_rc, VReg_96>;
let VAddrDwords = 4 in		let VAddrDwords = 4 in
def NAME # _V4 : MIMG_Store_Helper <op, asm, data_rc, VReg_128>;		def NAME # _V4 : MIMG_Store_Helper <op, asm, data_rc, VReg_128>;
}		}

multiclass MIMG_Store <bits<7> op, string asm, bit has_d16, bit mip = 0> {		multiclass MIMG_Store <bits<7> op, string asm, bit has_d16, bit mip = 0> {
def "" : MIMGBaseOpcode {		def "" : MIMGBaseOpcode {
		let Store = 1;
let LodOrClampOrMip = mip;		let LodOrClampOrMip = mip;
let HasD16 = has_d16;		let HasD16 = has_d16;
}		}

let BaseOpcode = !cast<MIMGBaseOpcode>(NAME) in {		let BaseOpcode = !cast<MIMGBaseOpcode>(NAME) in {
let VDataDwords = 1 in		let VDataDwords = 1 in
defm _V1 : MIMG_Store_Addr_Helper <op, asm, VGPR_32, 1>;		defm _V1 : MIMG_Store_Addr_Helper <op, asm, VGPR_32, 1>;
let VDataDwords = 2 in		let VDataDwords = 2 in
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	multiclass MIMG_Atomic_Addr_Helper_m <mimg op, string asm,
defm _V2 : MIMG_Atomic_Helper_m <op, asm, data_rc, VReg_64>;		defm _V2 : MIMG_Atomic_Helper_m <op, asm, data_rc, VReg_64>;
let VAddrDwords = 3 in		let VAddrDwords = 3 in
defm _V3 : MIMG_Atomic_Helper_m <op, asm, data_rc, VReg_96>;		defm _V3 : MIMG_Atomic_Helper_m <op, asm, data_rc, VReg_96>;
let VAddrDwords = 4 in		let VAddrDwords = 4 in
defm _V4 : MIMG_Atomic_Helper_m <op, asm, data_rc, VReg_128>;		defm _V4 : MIMG_Atomic_Helper_m <op, asm, data_rc, VReg_128>;
}		}

multiclass MIMG_Atomic <mimg op, string asm, bit isCmpSwap = 0> { // 64-bit atomics		multiclass MIMG_Atomic <mimg op, string asm, bit isCmpSwap = 0> { // 64-bit atomics
def "" : MIMGBaseOpcode;		def "" : MIMGBaseOpcode {
		let Atomic = 1;
		let AtomicX2 = isCmpSwap;
		}

let BaseOpcode = !cast<MIMGBaseOpcode>(NAME) in {		let BaseOpcode = !cast<MIMGBaseOpcode>(NAME) in {
// _V* variants have different dst size, but the size is encoded implicitly,		// _V* variants have different dst size, but the size is encoded implicitly,
// using dmask and tfe. Only 32-bit variant is registered with disassembler.		// using dmask and tfe. Only 32-bit variant is registered with disassembler.
// Other variants are reconstructed by disassembler using dmask and tfe.		// Other variants are reconstructed by disassembler using dmask and tfe.
let VDataDwords = !if(isCmpSwap, 2, 1) in		let VDataDwords = !if(isCmpSwap, 2, 1) in
defm _V1 : MIMG_Atomic_Addr_Helper_m <op, asm, !if(isCmpSwap, VReg_64, VGPR_32), 1>;		defm _V1 : MIMG_Atomic_Addr_Helper_m <op, asm, !if(isCmpSwap, VReg_64, VGPR_32), 1>;
let VDataDwords = !if(isCmpSwap, 4, 2) in		let VDataDwords = !if(isCmpSwap, 4, 2) in
Show All 29 Lines	multiclass MIMG_Sampler_Src_Helper <bits<7> op, string asm, RegisterClass dst_rc,
let VAddrDwords = 8 in		let VAddrDwords = 8 in
def _V8 : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_256>;		def _V8 : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_256>;
let VAddrDwords = 16 in		let VAddrDwords = 16 in
def _V16 : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_512>;		def _V16 : MIMG_Sampler_Helper <op, asm, dst_rc, VReg_512>;
}		}

class MIMG_Sampler_BaseOpcode<AMDGPUSampleVariant sample>		class MIMG_Sampler_BaseOpcode<AMDGPUSampleVariant sample>
: MIMGBaseOpcode {		: MIMGBaseOpcode {
		let Sampler = 1;
let NumExtraArgs = !size(sample.ExtraAddrArgs);		let NumExtraArgs = !size(sample.ExtraAddrArgs);
let Gradients = sample.Gradients;		let Gradients = sample.Gradients;
let LodOrClampOrMip = !ne(sample.LodOrClamp, "");		let LodOrClampOrMip = !ne(sample.LodOrClamp, "");
}		}

multiclass MIMG_Sampler <bits<7> op, AMDGPUSampleVariant sample, bit wqm = 0,		multiclass MIMG_Sampler <bits<7> op, AMDGPUSampleVariant sample, bit wqm = 0,
bit isGetLod = 0,		bit isGetLod = 0,
string asm = "image_sample"#sample.LowerCaseMod> {		string asm = "image_sample"#sample.LowerCaseMod> {
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
defm IMAGE_SAMPLE_C_CD_CL : MIMG_Sampler <0x0000006b, AMDGPUSample_c_cd_cl>;		defm IMAGE_SAMPLE_C_CD_CL : MIMG_Sampler <0x0000006b, AMDGPUSample_c_cd_cl>;
defm IMAGE_SAMPLE_CD_O : MIMG_Sampler <0x0000006c, AMDGPUSample_cd_o>;		defm IMAGE_SAMPLE_CD_O : MIMG_Sampler <0x0000006c, AMDGPUSample_cd_o>;
defm IMAGE_SAMPLE_CD_CL_O : MIMG_Sampler <0x0000006d, AMDGPUSample_cd_cl_o>;		defm IMAGE_SAMPLE_CD_CL_O : MIMG_Sampler <0x0000006d, AMDGPUSample_cd_cl_o>;
defm IMAGE_SAMPLE_C_CD_O : MIMG_Sampler <0x0000006e, AMDGPUSample_c_cd_o>;		defm IMAGE_SAMPLE_C_CD_O : MIMG_Sampler <0x0000006e, AMDGPUSample_c_cd_o>;
defm IMAGE_SAMPLE_C_CD_CL_O : MIMG_Sampler <0x0000006f, AMDGPUSample_c_cd_cl_o>;		defm IMAGE_SAMPLE_C_CD_CL_O : MIMG_Sampler <0x0000006f, AMDGPUSample_c_cd_cl_o>;
//def IMAGE_RSRC256 : MIMG_NoPattern_RSRC256 <"image_rsrc256", 0x0000007e>;		//def IMAGE_RSRC256 : MIMG_NoPattern_RSRC256 <"image_rsrc256", 0x0000007e>;
//def IMAGE_SAMPLER : MIMG_NoPattern_ <"image_sampler", 0x0000007f>;		//def IMAGE_SAMPLER : MIMG_NoPattern_ <"image_sampler", 0x0000007f>;

/******** ============================== ********/		/******** ========================================= ********/
/******** Dimension-aware image patterns ********/		/******** Table of dimension-aware image intrinsics ********/
/******** ============================== ********/		/******** ========================================= ********/

class getDwordsType<int dwords> {		class ImageDimIntrinsicInfo<AMDGPUImageDimIntrinsic I> {
int NumDwords = dwords;		Intrinsic Intr = I;
string suffix = !if(!lt(dwords, 1), ?,		MIMGBaseOpcode BaseOpcode = !cast<MIMGBaseOpcode>(!strconcat("IMAGE_", I.P.OpMod));
!if(!eq(dwords, 1), "_V1",		AMDGPUDimProps Dim = I.P.Dim;
!if(!eq(dwords, 2), "_V2",
!if(!le(dwords, 4), "_V4",
!if(!le(dwords, 8), "_V8",
!if(!le(dwords, 16), "_V16", ?))))));
ValueType VT = !if(!lt(dwords, 1), ?,
!if(!eq(dwords, 1), f32,
!if(!eq(dwords, 2), v2f32,
!if(!le(dwords, 4), v4f32,
!if(!le(dwords, 8), v8f32,
!if(!le(dwords, 16), v16f32, ?))))));
RegisterClass VReg = !if(!lt(dwords, 1), ?,
!if(!eq(dwords, 1), VGPR_32,
!if(!eq(dwords, 2), VReg_64,
!if(!le(dwords, 4), VReg_128,
!if(!le(dwords, 8), VReg_256,
!if(!le(dwords, 16), VReg_512, ?))))));
}

class makeRegSequence_Fold<int i, dag d> {
int idx = i;
dag lhs = d;
}

// Generate a dag node which returns a vector register of class RC into which
// the source operands given by names have been inserted (assuming that each
// name corresponds to an operand whose size is equal to a subregister).
class makeRegSequence<ValueType vt, RegisterClass RC, list<string> names> {
dag ret =
!if(!eq(!size(names), 1),
!dag(COPY_TO_REGCLASS, [?, RC], [names[0], ?]),
!foldl(makeRegSequence_Fold<0, (vt (IMPLICIT_DEF))>, names, f, name,
makeRegSequence_Fold<
!add(f.idx, 1),
!con((INSERT_SUBREG f.lhs),
!dag(INSERT_SUBREG, [?, !cast<SubRegIndex>("sub"#f.idx)],
[name, ?]))>).lhs);
}

class ImageDimPattern<AMDGPUImageDimIntrinsic I,
string dop, ValueType dty, bit d16,
string suffix = ""> : GCNPat<(undef), (undef)> {
list<AMDGPUArg> AddrArgs = I.P.AddrDefaultArgs;
getDwordsType AddrDwords = getDwordsType<!size(AddrArgs)>;

MIMG MI =
!cast<MIMG>(!strconcat("IMAGE_", I.P.OpMod, dop, AddrDwords.suffix, suffix));

// DAG fragment to match data arguments (vdata for store/atomic, dmask
// for non-atomic).
dag MatchDataDag =
!con(!dag(I, !foreach(arg, I.P.DataArgs, dty),
!foreach(arg, I.P.DataArgs, arg.Name)),
!if(I.P.IsAtomic, (I), (I i32:$dmask)));

// DAG fragment to match vaddr arguments.
dag MatchAddrDag = !dag(I, !foreach(arg, AddrArgs, arg.Type.VT),
!foreach(arg, AddrArgs, arg.Name));

// DAG fragment to match sampler resource and unorm arguments.
dag MatchSamplerDag = !if(I.P.IsSample, (I v4i32:$sampler, i1:$unorm), (I));

// DAG node that generates the MI vdata for store/atomic
getDwordsType DataDwords = getDwordsType<!size(I.P.DataArgs)>;
dag GenDataDag =
!if(I.P.IsAtomic, (MI makeRegSequence<DataDwords.VT, DataDwords.VReg,
!foreach(arg, I.P.DataArgs, arg.Name)>.ret),
!if(!size(I.P.DataArgs), (MI $vdata), (MI)));

// DAG node that generates the MI vaddr
dag GenAddrDag = makeRegSequence<AddrDwords.VT, AddrDwords.VReg,
!foreach(arg, AddrArgs, arg.Name)>.ret;
// DAG fragments that generate various inline flags
dag GenDmask =
!if(I.P.IsAtomic, (MI !add(!shl(1, DataDwords.NumDwords), -1)),
(MI (as_i32imm $dmask)));
dag GenGLC =
!if(I.P.IsAtomic, (MI 1),
(MI (bitextract_imm<0> $cachepolicy)));

dag MatchIntrinsic = !con(MatchDataDag,
MatchAddrDag,
(I v8i32:$rsrc),
MatchSamplerDag,
(I 0/texfailctrl/,
i32:$cachepolicy));
let PatternToMatch =
!if(!size(I.RetTypes), (dty MatchIntrinsic), MatchIntrinsic);

bit IsCmpSwap = !and(I.P.IsAtomic, !eq(!size(I.P.DataArgs), 2));
dag ImageInstruction =
!con(GenDataDag,
(MI GenAddrDag),
(MI $rsrc),
!if(I.P.IsSample, (MI $sampler), (MI)),
GenDmask,
!if(I.P.IsSample, (MI (as_i1imm $unorm)), (MI 1)),
GenGLC,
(MI (bitextract_imm<1> $cachepolicy),
0, /* r128 */
0, /* tfe */
0 /(as_i1imm $lwe)/,
{ I.P.Dim.DA }),
!if(MI.BaseOpcode.HasD16, (MI d16), (MI)));
let ResultInstrs = [
!if(IsCmpSwap, (EXTRACT_SUBREG ImageInstruction, sub0), ImageInstruction)
];
}

foreach intr = !listconcat(AMDGPUImageDimIntrinsics,
AMDGPUImageDimGetResInfoIntrinsics) in {
def intr#_pat_v1 : ImageDimPattern<intr, "_V1", f32, 0>;
def intr#_pat_v2 : ImageDimPattern<intr, "_V2", v2f32, 0>;
def intr#_pat_v4 : ImageDimPattern<intr, "_V4", v4f32, 0>;
}

multiclass ImageDimD16Helper<AMDGPUImageDimIntrinsic I,
AMDGPUImageDimIntrinsic d16helper> {
let SubtargetPredicate = HasUnpackedD16VMem in {
def _unpacked_v1 : ImageDimPattern<I, "_V1", f16, 1>;
def _unpacked_v2 : ImageDimPattern<d16helper, "_V2", v2i32, 1>;
def _unpacked_v4 : ImageDimPattern<d16helper, "_V4", v4i32, 1>;
} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem in {
def _packed_v1 : ImageDimPattern<I, "_V1", f16, 1>;
def _packed_v2 : ImageDimPattern<I, "_V1", v2f16, 1>;
def _packed_v4 : ImageDimPattern<I, "_V2", v4f16, 1>;
} // End HasPackedD16VMem.
}

foreach intr = AMDGPUImageDimIntrinsics in {
def intr#_d16helper_profile : AMDGPUDimProfileCopy<intr.P> {
let RetTypes = !foreach(ty, intr.P.RetTypes, llvm_any_ty);
let DataArgs = !foreach(arg, intr.P.DataArgs, AMDGPUArg<llvm_any_ty, arg.Name>);
}

let TargetPrefix = "SI", isTarget = 1 in
def int_SI_image_d16helper_ # intr.P.OpMod # intr.P.Dim.Name :
AMDGPUImageDimIntrinsic<!cast<AMDGPUDimProfile>(intr#"_d16helper_profile"),
intr.IntrProperties, intr.Properties>;

defm intr#_d16 :
ImageDimD16Helper<
intr, !cast<AMDGPUImageDimIntrinsic>(
"int_SI_image_d16helper_" # intr.P.OpMod # intr.P.Dim.Name)>;
}

foreach intr = AMDGPUImageDimGatherIntrinsics in {
def intr#_pat3 : ImageDimPattern<intr, "_V4", v4f32, 0>;

def intr#_d16helper_profile : AMDGPUDimProfileCopy<intr.P> {
let RetTypes = !foreach(ty, intr.P.RetTypes, llvm_any_ty);
let DataArgs = !foreach(arg, intr.P.DataArgs, AMDGPUArg<llvm_any_ty, arg.Name>);
}		}

let TargetPrefix = "SI", isTarget = 1 in		def ImageDimIntrinsicTable : GenericTable {
def int_SI_image_d16helper_ # intr.P.OpMod # intr.P.Dim.Name :		let FilterClass = "ImageDimIntrinsicInfo";
AMDGPUImageDimIntrinsic<!cast<AMDGPUDimProfile>(intr#"_d16helper_profile"),		let Fields = ["Intr", "BaseOpcode", "Dim"];
intr.IntrProperties, intr.Properties>;		GenericEnum TypeOf_BaseOpcode = MIMGBaseOpcode;
		GenericEnum TypeOf_Dim = MIMGDim;
let SubtargetPredicate = HasUnpackedD16VMem in {
def intr#_unpacked_v4 :
ImageDimPattern<!cast<AMDGPUImageDimIntrinsic>(
"int_SI_image_d16helper_" # intr.P.OpMod # intr.P.Dim.Name),
"_V4", v4i32, 1>;
} // End HasUnpackedD16VMem.

let SubtargetPredicate = HasPackedD16VMem in {		let PrimaryKey = ["Intr"];
def intr#_packed_v4 : ImageDimPattern<intr, "_V2", v4f16, 1>;		let PrimaryKeyName = "getImageDimIntrinsicInfo";
} // End HasPackedD16VMem.		let PrimaryKeyEarlyOut = 1;
}		}

foreach intr = AMDGPUImageDimAtomicIntrinsics in {		foreach intr = !listconcat(AMDGPUImageDimIntrinsics,
def intr#_pat1 : ImageDimPattern<intr, "_V1", i32, 0>;		AMDGPUImageDimAtomicIntrinsics) in {
		def : ImageDimIntrinsicInfo<intr>;
}		}

/******** ======================= ********/		/******** ======================= ********/
/******** Image sampling patterns ********/		/******** Image sampling patterns ********/
/******** ======================= ********/		/******** ======================= ********/

// ImageSample for amdgcn		// ImageSample for amdgcn
// TODO:		// TODO:
▲ Show 20 Lines • Show All 375 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.h

Show All 36 Lines	SDValue getPreloadedValue(SelectionDAG &DAG,
const SIMachineFunctionInfo &MFI,		const SIMachineFunctionInfo &MFI,
EVT VT,		EVT VT,
AMDGPUFunctionArgInfo::PreloadedValue) const;		AMDGPUFunctionArgInfo::PreloadedValue) const;

SDValue LowerGlobalAddress(AMDGPUMachineFunction *MFI, SDValue Op,		SDValue LowerGlobalAddress(AMDGPUMachineFunction *MFI, SDValue Op,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;
SDValue lowerImplicitZextParam(SelectionDAG &DAG, SDValue Op,		SDValue lowerImplicitZextParam(SelectionDAG &DAG, SDValue Op,
MVT VT, unsigned Offset) const;		MVT VT, unsigned Offset) const;
		SDValue lowerImage(SDValue Op, const AMDGPU::ImageDimIntrinsicInfo *Intr,
		SelectionDAG &DAG) const;

SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINTRINSIC_W_CHAIN(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_W_CHAIN(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;

SDValue widenLoad(LoadSDNode *Ld, DAGCombinerInfo &DCI) const;		SDValue widenLoad(LoadSDNode *Ld, DAGCombinerInfo &DCI) const;
SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 258 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,510 Lines • ▼ Show 20 Lines	static SDValue emitRemovedIntrinsicError(SelectionDAG &DAG, const SDLoc &DL,
EVT VT) {		EVT VT) {
DiagnosticInfoUnsupported BadIntrin(DAG.getMachineFunction().getFunction(),		DiagnosticInfoUnsupported BadIntrin(DAG.getMachineFunction().getFunction(),
"intrinsic not supported on subtarget",		"intrinsic not supported on subtarget",
DL.getDebugLoc());		DL.getDebugLoc());
DAG.getContext()->diagnose(BadIntrin);		DAG.getContext()->diagnose(BadIntrin);
return DAG.getUNDEF(VT);		return DAG.getUNDEF(VT);
}		}

		static SDValue getBuildDwordsVector(SelectionDAG &DAG, SDLoc DL,
		ArrayRef<SDValue> Elts) {
		assert(!Elts.empty());
		MVT Type;
		unsigned NumElts;

		if (Elts.size() == 1) {
		Type = MVT::f32;
		NumElts = 1;
		} else if (Elts.size() == 2) {
		Type = MVT::v2f32;
		NumElts = 2;
		} else if (Elts.size() <= 4) {
		Type = MVT::v4f32;
		NumElts = 4;
		} else if (Elts.size() <= 8) {
		Type = MVT::v8f32;
		NumElts = 8;
		} else {
		assert(Elts.size() <= 16);
		Type = MVT::v16f32;
		NumElts = 16;
		}

		SmallVector<SDValue, 16> VecElts(NumElts);
		for (unsigned i = 0; i < Elts.size(); ++i) {
		SDValue Elt = Elts[i];
		if (Elt.getValueType() != MVT::f32)
		Elt = DAG.getBitcast(MVT::f32, Elt);
		VecElts[i] = Elt;
		}
		for (unsigned i = Elts.size(); i < NumElts; ++i)
		VecElts[i] = DAG.getUNDEF(MVT::f32);

		if (NumElts == 1)
		return VecElts[0];
		return DAG.getBuildVector(Type, DL, VecElts);
		}

		static bool parseCachePolicy(SDValue CachePolicy, SelectionDAG &DAG,
		SDValue GLC, SDValue SLC) {
		auto CachePolicyConst = dyn_cast<ConstantSDNode>(CachePolicy.getNode());
		if (!CachePolicyConst)
		return false;

		uint64_t Value = CachePolicyConst->getZExtValue();
		SDLoc DL(CachePolicy);
		if (GLC) {
		*GLC = DAG.getTargetConstant((Value & 0x1) ? 1 : 0, DL, MVT::i32);
		Value &= ~(uint64_t)0x1;
		}
		if (SLC) {
		*SLC = DAG.getTargetConstant((Value & 0x2) ? 1 : 0, DL, MVT::i32);
		Value &= ~(uint64_t)0x2;
		}

		return Value == 0;
		}

		SDValue SITargetLowering::lowerImage(SDValue Op,
		const AMDGPU::ImageDimIntrinsicInfo *Intr,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		MachineFunction &MF = DAG.getMachineFunction();
		const AMDGPU::MIMGBaseOpcodeInfo *BaseOpcode =
		AMDGPU::getMIMGBaseOpcodeInfo(Intr->BaseOpcode);
		const AMDGPU::MIMGDimInfo *DimInfo = AMDGPU::getMIMGDimInfo(Intr->Dim);

		SmallVector<EVT, 2> ResultTypes(Op->value_begin(), Op->value_end());
		bool IsD16 = false;
		SDValue VData;
		int NumVDataDwords;
		unsigned AddrIdx; // Index of first address argument
		unsigned DMask;

		if (BaseOpcode->Atomic) {
		VData = Op.getOperand(2);

		bool Is64Bit = VData.getValueType() == MVT::i64;
		if (BaseOpcode->AtomicX2) {
		SDValue VData2 = Op.getOperand(3);
		VData = DAG.getBuildVector(Is64Bit ? MVT::v2i64 : MVT::v2i32, DL,
		{VData, VData2});
		if (Is64Bit)
		VData = DAG.getBitcast(MVT::v4i32, VData);

		ResultTypes[0] = Is64Bit ? MVT::v2i64 : MVT::v2i32;
		DMask = Is64Bit ? 0xf : 0x3;
		NumVDataDwords = Is64Bit ? 4 : 2;
		AddrIdx = 4;
		} else {
		DMask = Is64Bit ? 0x3 : 0x1;
		NumVDataDwords = Is64Bit ? 2 : 1;
		AddrIdx = 3;
		}
		} else {
		unsigned DMaskIdx;

		if (BaseOpcode->Store) {
		VData = Op.getOperand(2);

		MVT StoreVT = VData.getSimpleValueType();
		if (StoreVT.getScalarType() == MVT::f16) {
		arsenmUnsubmitted Done Reply Inline Actions .ScalarType() == s16 arsenm: .ScalarType() == s16
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Done. nhaehnle: Done.
		if (Subtarget->getGeneration() < SISubtarget::VOLCANIC_ISLANDS \|\|
		arsenmUnsubmitted Not Done Reply Inline Actions I think we already have a hasD16 feature? arsenm: I think we already have a hasD16 feature?
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions I don't see one. There's a hasUnpackedD16VMem(), but that just says whether `D16` is unpacked or packed (if it exists). FWIW, I'm not a fan of having explicit feature flags for features that are already clearly delineated by hardware generations, which is the case for `D16`. nhaehnle: I don't see one. There's a hasUnpackedD16VMem(), but that just says whether `D16` is unpacked…
		!BaseOpcode->HasD16)
		return Op; // D16 is unsupported for this instruction

		IsD16 = true;
		VData = handleD16VData(VData, DAG);
		}

		NumVDataDwords = (VData.getValueType().getSizeInBits() + 31) / 32;
		DMaskIdx = 3;
		} else {
		MVT LoadVT = Op.getSimpleValueType();
		if (LoadVT.getScalarType() == MVT::f16) {
		if (Subtarget->getGeneration() < SISubtarget::VOLCANIC_ISLANDS \|\|
		!BaseOpcode->HasD16)
		return Op; // D16 is unsupported for this instruction

		IsD16 = true;
		if (LoadVT.isVector() && Subtarget->hasUnpackedD16VMem())
		ResultTypes[0] = (LoadVT == MVT::v2f16) ? MVT::v2i32 : MVT::v4i32;
		}
		arsenmUnsubmitted Not Done Reply Inline Actions I'm going to be committing that patch soon so might as well just wait for that arsenm: I'm going to be committing that patch soon so might as well just wait for that
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Sure, will do. There's a few more changes to come anyway, and I want to submit them all at once to reduce merge conflicts with our internal branches. nhaehnle: Sure, will do. There's a few more changes to come anyway, and I want to submit them all at once…

		NumVDataDwords = (ResultTypes[0].getSizeInBits() + 31) / 32;
		DMaskIdx = isa<MemSDNode>(Op) ? 2 : 1;
		}

		auto DMaskConst = dyn_cast<ConstantSDNode>(Op.getOperand(DMaskIdx));
		if (!DMaskConst)
		return Op;

		AddrIdx = DMaskIdx + 1;
		DMask = DMaskConst->getZExtValue();
		if (!DMask && !BaseOpcode->Store) {
		// Eliminate no-op loads. Stores with dmask == 0 are not no-op: they
		// store the channels' default values.
		SDValue Undef = DAG.getUNDEF(Op.getValueType());
		if (isa<MemSDNode>(Op))
		return DAG.getMergeValues({Undef, Op.getOperand(0)}, DL);
		return Undef;
		}
		}

		unsigned NumVAddrs = BaseOpcode->NumExtraArgs +
		(BaseOpcode->Gradients ? DimInfo->NumGradients : 0) +
		arsenmUnsubmitted Done Reply Inline Actions Isn't this always the case? arsenm: Isn't this always the case?
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions No. getlod and getresinfo don't access memory and are INTRINSIC_WO_CHAIN. nhaehnle: No. getlod and getresinfo don't access memory and are INTRINSIC_WO_CHAIN.
		(BaseOpcode->Coordinates ? DimInfo->NumCoords : 0) +
		(BaseOpcode->LodOrClampOrMip ? 1 : 0);
		SmallVector<SDValue, 4> VAddrs;
		for (unsigned i = 0; i < NumVAddrs; ++i)
		VAddrs.push_back(Op.getOperand(AddrIdx + i));
		SDValue VAddr = getBuildDwordsVector(DAG, DL, VAddrs);

		SDValue True = DAG.getTargetConstant(1, DL, MVT::i1);
		SDValue False = DAG.getTargetConstant(0, DL, MVT::i1);
		unsigned CtrlIdx; // Index of texfailctrl argument
		SDValue Unorm;
		if (!BaseOpcode->Sampler) {
		Unorm = True;
		CtrlIdx = AddrIdx + NumVAddrs + 1;
		} else {
		auto UnormConst =
		arsenmUnsubmitted Done Reply Inline Actions i1? arsenm: i1?
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Good point, changed. It doesn't seem like any downstream users care... nhaehnle: Good point, changed. It doesn't seem like any downstream users care...
		dyn_cast<ConstantSDNode>(Op.getOperand(AddrIdx + NumVAddrs + 2));
		if (!UnormConst)
		return Op;

		Unorm = UnormConst->getZExtValue() ? True : False;
		CtrlIdx = AddrIdx + NumVAddrs + 3;
		}

		SDValue TexFail = Op.getOperand(CtrlIdx);
		auto TexFailConst = dyn_cast<ConstantSDNode>(TexFail.getNode());
		if (!TexFailConst \|\| TexFailConst->getZExtValue() != 0)
		return Op;

		SDValue GLC;
		SDValue SLC;
		if (BaseOpcode->Atomic) {
		GLC = True; // TODO no-return optimization
		if (!parseCachePolicy(Op.getOperand(CtrlIdx + 1), DAG, nullptr, &SLC))
		return Op;
		} else {
		if (!parseCachePolicy(Op.getOperand(CtrlIdx + 1), DAG, &GLC, &SLC))
		return Op;
		}

		SmallVector<SDValue, 14> Ops;
		if (BaseOpcode->Store \|\| BaseOpcode->Atomic)
		Ops.push_back(VData); // vdata
		Ops.push_back(VAddr);
		Ops.push_back(Op.getOperand(AddrIdx + NumVAddrs)); // rsrc
		if (BaseOpcode->Sampler)
		Ops.push_back(Op.getOperand(AddrIdx + NumVAddrs + 1)); // sampler
		Ops.push_back(DAG.getTargetConstant(DMask, DL, MVT::i32));
		Ops.push_back(Unorm);
		Ops.push_back(GLC);
		Ops.push_back(SLC);
		Ops.push_back(False); // r128
		Ops.push_back(False); // tfe
		Ops.push_back(False); // lwe
		Ops.push_back(DimInfo->DA ? True : False);
		if (BaseOpcode->HasD16)
		Ops.push_back(IsD16 ? True : False);
		if (isa<MemSDNode>(Op))
		Ops.push_back(Op.getOperand(0)); // chain

		int NumVAddrDwords = VAddr.getValueType().getSizeInBits() / 32;
		int Opcode = -1;

		if (Subtarget->getGeneration() >= SISubtarget::VOLCANIC_ISLANDS)
		Opcode = AMDGPU::getMIMGOpcode(Intr->BaseOpcode, AMDGPU::MIMGEncGfx8,
		NumVDataDwords, NumVAddrDwords);
		if (Opcode == -1)
		Opcode = AMDGPU::getMIMGOpcode(Intr->BaseOpcode, AMDGPU::MIMGEncGfx6,
		NumVDataDwords, NumVAddrDwords);
		assert(Opcode != -1);

		MachineSDNode *NewNode = DAG.getMachineNode(Opcode, DL, ResultTypes, Ops);
		if (auto MemOp = dyn_cast<MemSDNode>(Op)) {
		MachineInstr::mmo_iterator MemRefs = MF.allocateMemRefsArray(1);
		*MemRefs = MemOp->getMemOperand();
		NewNode->setMemRefs(MemRefs, MemRefs + 1);
		}

		if (BaseOpcode->AtomicX2) {
		SmallVector<SDValue, 1> Elt;
		DAG.ExtractVectorElements(SDValue(NewNode, 0), Elt, 0, 1);
		return DAG.getMergeValues({Elt[0], SDValue(NewNode, 1)}, DL);
		} else if (IsD16 && !BaseOpcode->Store) {
		MVT LoadVT = Op.getSimpleValueType();
		SDValue Adjusted = adjustLoadValueTypeImpl(
		SDValue(NewNode, 0), LoadVT, DL, DAG, Subtarget->hasUnpackedD16VMem());
		return DAG.getMergeValues({Adjusted, SDValue(NewNode, 1)}, DL);
		}

		return SDValue(NewNode, 0);
		}

SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,		SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
auto MFI = MF.getInfo<SIMachineFunctionInfo>();		auto MFI = MF.getInfo<SIMachineFunctionInfo>();

EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDLoc DL(Op);		SDLoc DL(Op);
unsigned IntrinsicID = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();		unsigned IntrinsicID = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
▲ Show 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_image_getresinfo: {

// Replace dmask with everything disabled with undef.		// Replace dmask with everything disabled with undef.
const ConstantSDNode *DMask = dyn_cast<ConstantSDNode>(Op.getOperand(Idx));		const ConstantSDNode *DMask = dyn_cast<ConstantSDNode>(Op.getOperand(Idx));
if (!DMask \|\| DMask->isNullValue())		if (!DMask \|\| DMask->isNullValue())
return DAG.getUNDEF(Op.getValueType());		return DAG.getUNDEF(Op.getValueType());
return SDValue();		return SDValue();
}		}
default:		default:
		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
		AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))
		return lowerImage(Op, ImageDimIntr, DAG);

return Op;		return Op;
}		}
}		}

SDValue SITargetLowering::LowerINTRINSIC_W_CHAIN(SDValue Op,		SDValue SITargetLowering::LowerINTRINSIC_W_CHAIN(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
unsigned IntrID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();		unsigned IntrID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
SDLoc DL(Op);		SDLoc DL(Op);
▲ Show 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	if (Subtarget->hasUnpackedD16VMem() &&
Op.getValueType().getScalarSizeInBits() == 16) {		Op.getValueType().getScalarSizeInBits() == 16) {
return adjustLoadValueType(getImageOpcode(IntrID), cast<MemSDNode>(Op),		return adjustLoadValueType(getImageOpcode(IntrID), cast<MemSDNode>(Op),
DAG);		DAG);
}		}

return SDValue();		return SDValue();
}		}
default:		default:
if (Subtarget->hasUnpackedD16VMem() &&		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
Op.getValueType().isVector() &&		AMDGPU::getImageDimIntrinsicInfo(IntrID))
Op.getValueType().getScalarSizeInBits() == 16) {		return lowerImage(Op, ImageDimIntr, DAG);
if (const AMDGPU::D16ImageDimIntrinsic *D16ImageDimIntr =
AMDGPU::lookupD16ImageDimIntrinsic(IntrID)) {
return adjustLoadValueType(D16ImageDimIntr->D16HelperIntr,
cast<MemSDNode>(Op), DAG, true);
}
}

return SDValue();		return SDValue();
}		}
}		}

SDValue SITargetLowering::handleD16VData(SDValue VData,		SDValue SITargetLowering::handleD16VData(SDValue VData,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
EVT StoreVT = VData.getValueType();		EVT StoreVT = VData.getValueType();
▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines	if (Subtarget->hasUnpackedD16VMem() &&
MemSDNode *M = cast<MemSDNode>(Op);		MemSDNode *M = cast<MemSDNode>(Op);
return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,		return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,
M->getMemoryVT(), M->getMemOperand());		M->getMemoryVT(), M->getMemOperand());
}		}

return SDValue();		return SDValue();
}		}
default: {		default: {
const AMDGPU::D16ImageDimIntrinsic *D16ImageDimIntr =		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
AMDGPU::lookupD16ImageDimIntrinsic(IntrinsicID);		AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))
if (D16ImageDimIntr) {		return lowerImage(Op, ImageDimIntr, DAG);
SDValue VData = Op.getOperand(2);
EVT StoreVT = VData.getValueType();
if (Subtarget->hasUnpackedD16VMem() &&
StoreVT.isVector() &&
StoreVT.getScalarSizeInBits() == 16) {
SmallVector<SDValue, 12> Ops(Op.getNode()->op_values());

Ops[1] = DAG.getConstant(D16ImageDimIntr->D16HelperIntr, DL, MVT::i32);
Ops[2] = handleD16VData(VData, DAG);

MemSDNode *M = cast<MemSDNode>(Op);
return DAG.getMemIntrinsicNode(ISD::INTRINSIC_VOID, DL, Op->getVTList(),
Ops, M->getMemoryVT(),
M->getMemOperand());
}
}

return Op;		return Op;
}		}
}		}
}		}

static SDValue getLoadExtOrTrunc(SelectionDAG &DAG,		static SDValue getLoadExtOrTrunc(SelectionDAG &DAG,
ISD::LoadExtType ExtType, SDValue Op,		ISD::LoadExtType ExtType, SDValue Op,
▲ Show 20 Lines • Show All 2,934 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

	Show All 33 Lines
	class MCRegisterInfo;			class MCRegisterInfo;
	class MCSection;			class MCSection;
	class MCSubtargetInfo;			class MCSubtargetInfo;
	class Triple;			class Triple;

	namespace AMDGPU {			namespace AMDGPU {

	#define GET_MIMGBaseOpcode_DECL			#define GET_MIMGBaseOpcode_DECL
				#define GET_MIMGDim_DECL
	#define GET_MIMGEncoding_DECL			#define GET_MIMGEncoding_DECL
	#include "AMDGPUGenSearchableTables.inc"			#include "AMDGPUGenSearchableTables.inc"

	namespace IsaInfo {			namespace IsaInfo {

	enum {			enum {
	// The closed Vulkan driver sets 96, which limits the wave count to 8 but			// The closed Vulkan driver sets 96, which limits the wave count to 8 but
	// doesn't spill SGPRs as much as when 80 is set.			// doesn't spill SGPRs as much as when 80 is set.
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	/// execution unit requirement for given subtarget \p Features.			/// execution unit requirement for given subtarget \p Features.
	unsigned getMaxNumVGPRs(const FeatureBitset &Features, unsigned WavesPerEU);			unsigned getMaxNumVGPRs(const FeatureBitset &Features, unsigned WavesPerEU);

	} // end namespace IsaInfo			} // end namespace IsaInfo

	LLVM_READONLY			LLVM_READONLY
	int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIdx);			int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIdx);

				struct MIMGBaseOpcodeInfo {
				MIMGBaseOpcode BaseOpcode;
				bool Store;
				bool Atomic;
				bool AtomicX2;
				bool Sampler;

				uint8_t NumExtraArgs;
				bool Gradients;
				bool Coordinates;
				bool LodOrClampOrMip;
				bool HasD16;
				};

				LLVM_READONLY
				const MIMGBaseOpcodeInfo *getMIMGBaseOpcodeInfo(unsigned BaseOpcode);

				struct MIMGDimInfo {
				MIMGDim Dim;
				uint8_t NumCoords;
				uint8_t NumGradients;
				bool DA;
				};

				LLVM_READONLY
				const MIMGDimInfo *getMIMGDimInfo(unsigned Dim);

				LLVM_READONLY
				int getMIMGOpcode(unsigned BaseOpcode, unsigned MIMGEncoding,
				unsigned VDataDwords, unsigned VAddrDwords);

	LLVM_READONLY			LLVM_READONLY
	int getMaskedMIMGOp(unsigned Opc, unsigned NewChannels);			int getMaskedMIMGOp(unsigned Opc, unsigned NewChannels);

	LLVM_READONLY			LLVM_READONLY
	int getMCOpcode(uint16_t Opcode, unsigned Gen);			int getMCOpcode(uint16_t Opcode, unsigned Gen);

	void initDefaultAMDKernelCodeT(amd_kernel_code_t &Header,			void initDefaultAMDKernelCodeT(amd_kernel_code_t &Header,
	const FeatureBitset &Features);			const FeatureBitset &Features);
	▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

	Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	struct MIMGInfo {			struct MIMGInfo {
	uint16_t Opcode;			uint16_t Opcode;
	uint16_t BaseOpcode;			uint16_t BaseOpcode;
	uint8_t MIMGEncoding;			uint8_t MIMGEncoding;
	uint8_t VDataDwords;			uint8_t VDataDwords;
	uint8_t VAddrDwords;			uint8_t VAddrDwords;
	};			};

				#define GET_MIMGBaseOpcodesTable_IMPL
				#define GET_MIMGDimInfoTable_IMPL
	#define GET_MIMGInfoTable_IMPL			#define GET_MIMGInfoTable_IMPL
	#include "AMDGPUGenSearchableTables.inc"			#include "AMDGPUGenSearchableTables.inc"

				int getMIMGOpcode(unsigned BaseOpcode, unsigned MIMGEncoding,
				unsigned VDataDwords, unsigned VAddrDwords) {
				const MIMGInfo *Info = getMIMGOpcodeHelper(BaseOpcode, MIMGEncoding,
				VDataDwords, VAddrDwords);
				return Info ? Info->Opcode : -1;
				}

	int getMaskedMIMGOp(unsigned Opc, unsigned NewChannels) {			int getMaskedMIMGOp(unsigned Opc, unsigned NewChannels) {
	const MIMGInfo *OrigInfo = getMIMGInfo(Opc);			const MIMGInfo *OrigInfo = getMIMGInfo(Opc);
	const MIMGInfo *NewInfo =			const MIMGInfo *NewInfo =
	getMIMGOpcodeHelper(OrigInfo->BaseOpcode, OrigInfo->MIMGEncoding,			getMIMGOpcodeHelper(OrigInfo->BaseOpcode, OrigInfo->MIMGEncoding,
	NewChannels, OrigInfo->VAddrDwords);			NewChannels, OrigInfo->VAddrDwords);
	return NewInfo ? NewInfo->Opcode : -1;			return NewInfo ? NewInfo->Opcode : -1;
	}			}

	▲ Show 20 Lines • Show All 789 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.image.getlod.dim.ll

This file was added.

				; RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck --check-prefix=GCN %s
				; RUN: llc < %s -march=amdgcn -mcpu=gfx900 -verify-machineinstrs \| FileCheck --check-prefix=GCN %s

				; GCN-LABEL: {{^}}getlod_1d:
				; GCN: image_get_lod v[0:3], v0, s[0:7], s[8:11] dmask:0xf{{$}}
				; GCN: s_waitcnt vmcnt(0)
				define amdgpu_ps <4 x float> @getlod_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.getlod.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
				ret <4 x float> %r
				}

				; GCN-LABEL: {{^}}getlod_2d:
				; GCN: image_get_lod v[0:1], v[0:1], s[0:7], s[8:11] dmask:0x3{{$}}
				; GCN: s_waitcnt vmcnt(0)
				define amdgpu_ps <2 x float> @getlod_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {
				main_body:
				%r = call <2 x float> @llvm.amdgcn.image.getlod.2d.v2f32.f32(i32 3, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
				ret <2 x float> %r
				}

				; GCN-LABEL: {{^}}adjust_writemask_getlod_none_enabled:
				; GCN-NOT: image
				define amdgpu_ps <4 x float> @adjust_writemask_getlod_none_enabled(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.getlod.2d.v4f32.f32(i32 0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
				ret <4 x float> %r
				}

				declare <4 x float> @llvm.amdgcn.image.getlod.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.getlod.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #0
				declare <2 x float> @llvm.amdgcn.image.getlod.2d.v2f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #0

				attributes #0 = { nounwind readnone }