This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add a16 feature to gfx10
ClosedPublic

Authored by sebastian-ne on Feb 4 2020, 5:21 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
arsenm
rtaylor

Commits

rG8756869170e6: [AMDGPU] Add a16 feature to gfx10

Summary

Based on D72931

This adds a new feature called A16 which is enabled for gfx10.
gfx9 keeps the R128A16 feature so it can share all the instruction encodings
with gfx7/8.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 45843
Build 47948: arc lint + arc unit

Event Timeline

sebastian-ne created this revision.Feb 4 2020, 5:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2020, 5:21 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 7 others. · View Herald Transcript

Harbormaster completed remote builds in B45674: Diff 242302.Feb 4 2020, 5:28 AM

Seems like it's missing some assembler/disassembler tests

Add gfx10 to a16 tests

Harbormaster completed remote builds in B45764: Diff 242598.Feb 5 2020, 6:39 AM

The encoding changes cause a mess with the feature definitions, but cleaning it up would require splitting up machine opcodes further which would bloat TableGen tables... so let's keep this approach. I'd ask for one cleanup though.

llvm/lib/Target/AMDGPU/AMDGPU.td
372–376	Please rename the feature to gfx10a16, and the description to: "Support gfx10-style A16 for 16-bit coordinates/gradients/lod/clamp/mip image operands". (Note the "coordinates" typo). The idea here is to hopefully reduce future confusion sightly a bit by explicitly calling out that there is gfx9-style A16 vs. gfx10-style A16. At the same time, it seems a good idea to change the description of the R128A16 analogously, perhaps adding ", where a16 is aliased with r128".

Rename a16 to gfx10a16 in some places

Harbormaster completed remote builds in B45843: Diff 242839.Feb 6 2020, 2:17 AM

Thanks, LGTM.

This revision is now accepted and ready to land.Feb 7 2020, 12:54 AM

Closed by commit rG8756869170e6: [AMDGPU] Add a16 feature to gfx10 (authored by sebastian-ne). · Explain WhyFeb 10 2020, 12:24 AM

This revision was automatically updated to reflect the committed changes.

foad mentioned this in D141069: [AMDGPU][NFC] Rename GFX10A16 operands..Jan 5 2023, 11:04 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPU.td

14 lines

AMDGPUSubtarget.h

5 lines

AMDGPUSubtarget.cpp

1 line

AsmParser/

AMDGPUAsmParser.cpp

18 lines

MCTargetDesc/

AMDGPUInstPrinter.h

2 lines

AMDGPUInstPrinter.cpp

5 lines

32 lines

10 lines

2 lines

38 lines

1 line

SILoadStoreOptimizer.cpp

3 lines

Utils/

AMDGPUBaseInfo.h

1 line

AMDGPUBaseInfo.cpp

6 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.image.a16.dim.ll

621 lines

llvm.amdgcn.image.a16.encode.ll

959 lines

llvm.amdgcn.image.gather4.a16.dim.ll

433 lines

llvm.amdgcn.image.load.a16.d16.ll

39 lines

llvm.amdgcn.image.load.a16.ll

39 lines

llvm.amdgcn.image.sample.a16.dim.ll

1422 lines

llvm.amdgcn.image.store.a16.d16.ll

172 lines

llvm.amdgcn.image.store.a16.ll

172 lines

mcp-overlap-after-propagation.mir

4 lines

nsa-vmem-hazard.mir

10 lines

Diff 242839

llvm/lib/Target/AMDGPU/AMDGPU.td

Show First 20 Lines • Show All 354 Lines • ▼ Show 20 Lines	def FeatureDPP8 : SubtargetFeature<"dpp8",
"HasDPP8",		"HasDPP8",
"true",		"true",
"Support DPP8 (Data Parallel Primitives) extension"		"Support DPP8 (Data Parallel Primitives) extension"
>;		>;

def FeatureR128A16 : SubtargetFeature<"r128-a16",		def FeatureR128A16 : SubtargetFeature<"r128-a16",
"HasR128A16",		"HasR128A16",
"true",		"true",
"Support 16 bit coordindates/gradients/lod/clamp/mip types on gfx9"		"Support gfx9-style A16 for 16-bit coordinates/gradients/lod/clamp/mip image operands, where a16 is aliased with r128"
		>;

		def FeatureGFX10A16 : SubtargetFeature<"a16",
		"HasGFX10A16",
		"true",
		"Support gfx10-style A16 for 16-bit coordinates/gradients/lod/clamp/mip image operands"
>;		>;

def FeatureNSAEncoding : SubtargetFeature<"nsa-encoding",		def FeatureNSAEncoding : SubtargetFeature<"nsa-encoding",
"HasNSAEncoding",		"HasNSAEncoding",
"true",		"true",
"Support NSA encoding for image instructions"		"Support NSA encoding for image instructions"
>;		>;
		nhaehnleUnsubmitted Not Done Reply Inline Actions Please rename the feature to gfx10a16, and the description to: "Support gfx10-style A16 for 16-bit coordinates/gradients/lod/clamp/mip image operands". (Note the "coordinates" typo). The idea here is to hopefully reduce future confusion sightly a bit by explicitly calling out that there is gfx9-style A16 vs. gfx10-style A16. At the same time, it seems a good idea to change the description of the R128A16 analogously, perhaps adding ", where a16 is aliased with r128". nhaehnle: Please rename the feature to gfx10a16, and the description to: "Support gfx10-style A16 for 16…

def FeatureIntClamp : SubtargetFeature<"int-clamp-insts",		def FeatureIntClamp : SubtargetFeature<"int-clamp-insts",
"HasIntClamp",		"HasIntClamp",
"true",		"true",
"Support clamp for integer destination"		"Support clamp for integer destination"
>;		>;

def FeatureUnpackedD16VMem : SubtargetFeature<"unpacked-d16-vmem",		def FeatureUnpackedD16VMem : SubtargetFeature<"unpacked-d16-vmem",
▲ Show 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	def FeatureGFX10 : GCNSubtargetFeatureGeneration<"GFX10",
FeatureSMemRealTime, FeatureInv2PiInlineImm,		FeatureSMemRealTime, FeatureInv2PiInlineImm,
FeatureApertureRegs, FeatureGFX9Insts, FeatureGFX10Insts, FeatureVOP3P,		FeatureApertureRegs, FeatureGFX9Insts, FeatureGFX10Insts, FeatureVOP3P,
FeatureMovrel, FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,		FeatureMovrel, FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,
FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,		FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,
FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,		FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,
FeatureAddNoCarryInsts, FeatureFmaMixInsts, FeatureGFX8Insts,		FeatureAddNoCarryInsts, FeatureFmaMixInsts, FeatureGFX8Insts,
FeatureNoSdstCMPX, FeatureVscnt, FeatureRegisterBanking,		FeatureNoSdstCMPX, FeatureVscnt, FeatureRegisterBanking,
FeatureVOP3Literal, FeatureDPP8,		FeatureVOP3Literal, FeatureDPP8,
FeatureNoDataDepHazard, FeaturePkFmacF16Inst, FeatureDoesNotSupportSRAMECC		FeatureNoDataDepHazard, FeaturePkFmacF16Inst, FeatureDoesNotSupportSRAMECC,
		FeatureGFX10A16
]		]
>;		>;

class FeatureSet<list<SubtargetFeature> Features_> {		class FeatureSet<list<SubtargetFeature> Features_> {
list<SubtargetFeature> Features = Features_;		list<SubtargetFeature> Features = Features_;
}		}

def FeatureISAVersion6_0_0 : FeatureSet<[FeatureSouthernIslands,		def FeatureISAVersion6_0_0 : FeatureSet<[FeatureSouthernIslands,
▲ Show 20 Lines • Show All 395 Lines • ▼ Show 20 Lines	def HasDPP : Predicate<"Subtarget->hasDPP()">,
AssemblerPredicate<"FeatureGCN3Encoding,FeatureDPP">;		AssemblerPredicate<"FeatureGCN3Encoding,FeatureDPP">;

def HasDPP8 : Predicate<"Subtarget->hasDPP8()">,		def HasDPP8 : Predicate<"Subtarget->hasDPP8()">,
AssemblerPredicate<"!FeatureGCN3Encoding,FeatureGFX10Insts,FeatureDPP8">;		AssemblerPredicate<"!FeatureGCN3Encoding,FeatureGFX10Insts,FeatureDPP8">;

def HasR128A16 : Predicate<"Subtarget->hasR128A16()">,		def HasR128A16 : Predicate<"Subtarget->hasR128A16()">,
AssemblerPredicate<"FeatureR128A16">;		AssemblerPredicate<"FeatureR128A16">;

		def HasGFX10A16 : Predicate<"Subtarget->hasGFX10A16()">,
		AssemblerPredicate<"FeatureGFX10A16">;

def HasDPP16 : Predicate<"Subtarget->hasDPP()">,		def HasDPP16 : Predicate<"Subtarget->hasDPP()">,
AssemblerPredicate<"!FeatureGCN3Encoding,FeatureGFX10Insts,FeatureDPP">;		AssemblerPredicate<"!FeatureGCN3Encoding,FeatureGFX10Insts,FeatureDPP">;

def HasIntClamp : Predicate<"Subtarget->hasIntClamp()">,		def HasIntClamp : Predicate<"Subtarget->hasIntClamp()">,
AssemblerPredicate<"FeatureIntClamp">;		AssemblerPredicate<"FeatureIntClamp">;

def HasMadMixInsts : Predicate<"Subtarget->hasMadMixInsts()">,		def HasMadMixInsts : Predicate<"Subtarget->hasMadMixInsts()">,
AssemblerPredicate<"FeatureMadMixInsts">;		AssemblerPredicate<"FeatureMadMixInsts">;
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

Show First 20 Lines • Show All 336 Lines • ▼ Show 20 Lines	protected:
bool HasSDWAOmod;		bool HasSDWAOmod;
bool HasSDWAScalar;		bool HasSDWAScalar;
bool HasSDWASdst;		bool HasSDWASdst;
bool HasSDWAMac;		bool HasSDWAMac;
bool HasSDWAOutModsVOPC;		bool HasSDWAOutModsVOPC;
bool HasDPP;		bool HasDPP;
bool HasDPP8;		bool HasDPP8;
bool HasR128A16;		bool HasR128A16;
		bool HasGFX10A16;
bool HasNSAEncoding;		bool HasNSAEncoding;
bool HasDLInsts;		bool HasDLInsts;
bool HasDot1Insts;		bool HasDot1Insts;
bool HasDot2Insts;		bool HasDot2Insts;
bool HasDot3Insts;		bool HasDot3Insts;
bool HasDot4Insts;		bool HasDot4Insts;
bool HasDot5Insts;		bool HasDot5Insts;
bool HasDot6Insts;		bool HasDot6Insts;
▲ Show 20 Lines • Show All 630 Lines • ▼ Show 20 Lines	public:
bool hasDPP8() const {		bool hasDPP8() const {
return HasDPP8;		return HasDPP8;
}		}

bool hasR128A16() const {		bool hasR128A16() const {
return HasR128A16;		return HasR128A16;
}		}

		bool hasGFX10A16() const {
		return HasGFX10A16;
		}

bool hasOffset3fBug() const {		bool hasOffset3fBug() const {
return HasOffset3fBug;		return HasOffset3fBug;
}		}

bool hasNSAEncoding() const {		bool hasNSAEncoding() const {
return HasNSAEncoding;		return HasNSAEncoding;
}		}

▲ Show 20 Lines • Show All 367 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	GCNSubtarget::GCNSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
HasSDWAOmod(false),		HasSDWAOmod(false),
HasSDWAScalar(false),		HasSDWAScalar(false),
HasSDWASdst(false),		HasSDWASdst(false),
HasSDWAMac(false),		HasSDWAMac(false),
HasSDWAOutModsVOPC(false),		HasSDWAOutModsVOPC(false),
HasDPP(false),		HasDPP(false),
HasDPP8(false),		HasDPP8(false),
HasR128A16(false),		HasR128A16(false),
		HasGFX10A16(false),
HasNSAEncoding(false),		HasNSAEncoding(false),
HasDLInsts(false),		HasDLInsts(false),
HasDot1Insts(false),		HasDot1Insts(false),
HasDot2Insts(false),		HasDot2Insts(false),
HasDot3Insts(false),		HasDot3Insts(false),
HasDot4Insts(false),		HasDot4Insts(false),
HasDot5Insts(false),		HasDot5Insts(false),
HasDot6Insts(false),		HasDot6Insts(false),
▲ Show 20 Lines • Show All 647 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	enum ImmTy {
ImmTySdwaSrc0Sel,		ImmTySdwaSrc0Sel,
ImmTySdwaSrc1Sel,		ImmTySdwaSrc1Sel,
ImmTySdwaDstUnused,		ImmTySdwaDstUnused,
ImmTyDMask,		ImmTyDMask,
ImmTyDim,		ImmTyDim,
ImmTyUNorm,		ImmTyUNorm,
ImmTyDA,		ImmTyDA,
ImmTyR128A16,		ImmTyR128A16,
		ImmTyA16,
ImmTyLWE,		ImmTyLWE,
ImmTyExpTgt,		ImmTyExpTgt,
ImmTyExpCompr,		ImmTyExpCompr,
ImmTyExpVM,		ImmTyExpVM,
ImmTyFORMAT,		ImmTyFORMAT,
ImmTyHwreg,		ImmTyHwreg,
ImmTyOff,		ImmTyOff,
ImmTySendMsg,		ImmTySendMsg,
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	public:

bool isClampSI() const { return isImmTy(ImmTyClampSI); }		bool isClampSI() const { return isImmTy(ImmTyClampSI); }
bool isOModSI() const { return isImmTy(ImmTyOModSI); }		bool isOModSI() const { return isImmTy(ImmTyOModSI); }
bool isDMask() const { return isImmTy(ImmTyDMask); }		bool isDMask() const { return isImmTy(ImmTyDMask); }
bool isDim() const { return isImmTy(ImmTyDim); }		bool isDim() const { return isImmTy(ImmTyDim); }
bool isUNorm() const { return isImmTy(ImmTyUNorm); }		bool isUNorm() const { return isImmTy(ImmTyUNorm); }
bool isDA() const { return isImmTy(ImmTyDA); }		bool isDA() const { return isImmTy(ImmTyDA); }
bool isR128A16() const { return isImmTy(ImmTyR128A16); }		bool isR128A16() const { return isImmTy(ImmTyR128A16); }
		bool isGFX10A16() const { return isImmTy(ImmTyA16); }
bool isLWE() const { return isImmTy(ImmTyLWE); }		bool isLWE() const { return isImmTy(ImmTyLWE); }
bool isOff() const { return isImmTy(ImmTyOff); }		bool isOff() const { return isImmTy(ImmTyOff); }
bool isExpTgt() const { return isImmTy(ImmTyExpTgt); }		bool isExpTgt() const { return isImmTy(ImmTyExpTgt); }
bool isExpVM() const { return isImmTy(ImmTyExpVM); }		bool isExpVM() const { return isImmTy(ImmTyExpVM); }
bool isExpCompr() const { return isImmTy(ImmTyExpCompr); }		bool isExpCompr() const { return isImmTy(ImmTyExpCompr); }
bool isOffen() const { return isImmTy(ImmTyOffen); }		bool isOffen() const { return isImmTy(ImmTyOffen); }
bool isIdxen() const { return isImmTy(ImmTyIdxen); }		bool isIdxen() const { return isImmTy(ImmTyIdxen); }
bool isAddr64() const { return isImmTy(ImmTyAddr64); }		bool isAddr64() const { return isImmTy(ImmTyAddr64); }
▲ Show 20 Lines • Show All 516 Lines • ▼ Show 20 Lines	static void printImmTy(raw_ostream& OS, ImmTy Type) {
case ImmTySdwaSrc0Sel: OS << "SdwaSrc0Sel"; break;		case ImmTySdwaSrc0Sel: OS << "SdwaSrc0Sel"; break;
case ImmTySdwaSrc1Sel: OS << "SdwaSrc1Sel"; break;		case ImmTySdwaSrc1Sel: OS << "SdwaSrc1Sel"; break;
case ImmTySdwaDstUnused: OS << "SdwaDstUnused"; break;		case ImmTySdwaDstUnused: OS << "SdwaDstUnused"; break;
case ImmTyDMask: OS << "DMask"; break;		case ImmTyDMask: OS << "DMask"; break;
case ImmTyDim: OS << "Dim"; break;		case ImmTyDim: OS << "Dim"; break;
case ImmTyUNorm: OS << "UNorm"; break;		case ImmTyUNorm: OS << "UNorm"; break;
case ImmTyDA: OS << "DA"; break;		case ImmTyDA: OS << "DA"; break;
case ImmTyR128A16: OS << "R128A16"; break;		case ImmTyR128A16: OS << "R128A16"; break;
		case ImmTyA16: OS << "A16"; break;
case ImmTyLWE: OS << "LWE"; break;		case ImmTyLWE: OS << "LWE"; break;
case ImmTyOff: OS << "Off"; break;		case ImmTyOff: OS << "Off"; break;
case ImmTyExpTgt: OS << "ExpTgt"; break;		case ImmTyExpTgt: OS << "ExpTgt"; break;
case ImmTyExpCompr: OS << "ExpCompr"; break;		case ImmTyExpCompr: OS << "ExpCompr"; break;
case ImmTyExpVM: OS << "ExpVM"; break;		case ImmTyExpVM: OS << "ExpVM"; break;
case ImmTyHwreg: OS << "Hwreg"; break;		case ImmTyHwreg: OS << "Hwreg"; break;
case ImmTySendMsg: OS << "SendMsg"; break;		case ImmTySendMsg: OS << "SendMsg"; break;
case ImmTyInterpSlot: OS << "InterpSlot"; break;		case ImmTyInterpSlot: OS << "InterpSlot"; break;
▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	public:
bool hasMIMG_R128() const {		bool hasMIMG_R128() const {
return AMDGPU::hasMIMG_R128(getSTI());		return AMDGPU::hasMIMG_R128(getSTI());
}		}

bool hasPackedD16() const {		bool hasPackedD16() const {
return AMDGPU::hasPackedD16(getSTI());		return AMDGPU::hasPackedD16(getSTI());
}		}

		bool hasGFX10A16() const {
		return AMDGPU::hasGFX10A16(getSTI());
		}

bool isSI() const {		bool isSI() const {
return AMDGPU::isSI(getSTI());		return AMDGPU::isSI(getSTI());
}		}

bool isCI() const {		bool isCI() const {
return AMDGPU::isCI(getSTI());		return AMDGPU::isCI(getSTI());
}		}

▲ Show 20 Lines • Show All 3,477 Lines • ▼ Show 20 Lines	AMDGPUAsmParser::parseNamedBit(const char *Name, OperandVector &Operands,

// We are at the end of the statement, and this is a default argument, so		// We are at the end of the statement, and this is a default argument, so
// use a default value.		// use a default value.
if (getLexer().isNot(AsmToken::EndOfStatement)) {		if (getLexer().isNot(AsmToken::EndOfStatement)) {
switch(getLexer().getKind()) {		switch(getLexer().getKind()) {
case AsmToken::Identifier: {		case AsmToken::Identifier: {
StringRef Tok = Parser.getTok().getString();		StringRef Tok = Parser.getTok().getString();
if (Tok == Name) {		if (Tok == Name) {
if (Tok == "r128" && isGFX9())		if (Tok == "r128" && !hasMIMG_R128())
Error(S, "r128 modifier is not supported on this GPU");		Error(S, "r128 modifier is not supported on this GPU");
if (Tok == "a16" && !isGFX9() && !isGFX10())		if (Tok == "a16" && !isGFX9() && !hasGFX10A16())
Error(S, "a16 modifier is not supported on this GPU");		Error(S, "a16 modifier is not supported on this GPU");
Bit = 1;		Bit = 1;
Parser.Lex();		Parser.Lex();
} else if (Tok.startswith("no") && Tok.endswith(Name)) {		} else if (Tok.startswith("no") && Tok.endswith(Name)) {
Bit = 0;		Bit = 0;
Parser.Lex();		Parser.Lex();
} else {		} else {
return MatchOperand_NoMatch;		return MatchOperand_NoMatch;
}		}
break;		break;
}		}
default:		default:
return MatchOperand_NoMatch;		return MatchOperand_NoMatch;
}		}
}		}

if (!isGFX10() && ImmTy == AMDGPUOperand::ImmTyDLC)		if (!isGFX10() && ImmTy == AMDGPUOperand::ImmTyDLC)
return MatchOperand_ParseFail;		return MatchOperand_ParseFail;

		if (isGFX9() && ImmTy == AMDGPUOperand::ImmTyA16)
		ImmTy = AMDGPUOperand::ImmTyR128A16;

Operands.push_back(AMDGPUOperand::CreateImm(this, Bit, S, ImmTy));		Operands.push_back(AMDGPUOperand::CreateImm(this, Bit, S, ImmTy));
return MatchOperand_Success;		return MatchOperand_Success;
}		}

static void addOptionalImmOperand(		static void addOptionalImmOperand(
MCInst& Inst, const OperandVector& Operands,		MCInst& Inst, const OperandVector& Operands,
AMDGPUAsmParser::OptionalImmIndexMap& OptionalIdx,		AMDGPUAsmParser::OptionalImmIndexMap& OptionalIdx,
AMDGPUOperand::ImmTy ImmT,		AMDGPUOperand::ImmTy ImmT,
▲ Show 20 Lines • Show All 1,299 Lines • ▼ Show 20 Lines	void AMDGPUAsmParser::cvtMIMG(MCInst &Inst, const OperandVector &Operands,
if (IsGFX10)		if (IsGFX10)
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDim, -1);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDim, -1);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyUNorm);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyUNorm);
if (IsGFX10)		if (IsGFX10)
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDLC);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDLC);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyGLC);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyGLC);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTySLC);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTySLC);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyR128A16);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyR128A16);
		if (IsGFX10)
		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyA16);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyTFE);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyTFE);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyLWE);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyLWE);
if (!IsGFX10)		if (!IsGFX10)
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDA);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDA);
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyD16);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyD16);
}		}

void AMDGPUAsmParser::cvtMIMGAtomic(MCInst &Inst, const OperandVector &Operands) {		void AMDGPUAsmParser::cvtMIMGAtomic(MCInst &Inst, const OperandVector &Operands) {
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	static const OptionalOperand AMDGPUOptionalOperandTable[] = {
{"tfe", AMDGPUOperand::ImmTyTFE, true, nullptr},		{"tfe", AMDGPUOperand::ImmTyTFE, true, nullptr},
{"d16", AMDGPUOperand::ImmTyD16, true, nullptr},		{"d16", AMDGPUOperand::ImmTyD16, true, nullptr},
{"high", AMDGPUOperand::ImmTyHigh, true, nullptr},		{"high", AMDGPUOperand::ImmTyHigh, true, nullptr},
{"clamp", AMDGPUOperand::ImmTyClampSI, true, nullptr},		{"clamp", AMDGPUOperand::ImmTyClampSI, true, nullptr},
{"omod", AMDGPUOperand::ImmTyOModSI, false, ConvertOmodMul},		{"omod", AMDGPUOperand::ImmTyOModSI, false, ConvertOmodMul},
{"unorm", AMDGPUOperand::ImmTyUNorm, true, nullptr},		{"unorm", AMDGPUOperand::ImmTyUNorm, true, nullptr},
{"da", AMDGPUOperand::ImmTyDA, true, nullptr},		{"da", AMDGPUOperand::ImmTyDA, true, nullptr},
{"r128", AMDGPUOperand::ImmTyR128A16, true, nullptr},		{"r128", AMDGPUOperand::ImmTyR128A16, true, nullptr},
{"a16", AMDGPUOperand::ImmTyR128A16, true, nullptr},		{"a16", AMDGPUOperand::ImmTyA16, true, nullptr},
{"lwe", AMDGPUOperand::ImmTyLWE, true, nullptr},		{"lwe", AMDGPUOperand::ImmTyLWE, true, nullptr},
{"d16", AMDGPUOperand::ImmTyD16, true, nullptr},		{"d16", AMDGPUOperand::ImmTyD16, true, nullptr},
{"dmask", AMDGPUOperand::ImmTyDMask, false, nullptr},		{"dmask", AMDGPUOperand::ImmTyDMask, false, nullptr},
{"dim", AMDGPUOperand::ImmTyDim, false, nullptr},		{"dim", AMDGPUOperand::ImmTyDim, false, nullptr},
{"row_mask", AMDGPUOperand::ImmTyDppRowMask, false, nullptr},		{"row_mask", AMDGPUOperand::ImmTyDppRowMask, false, nullptr},
{"bank_mask", AMDGPUOperand::ImmTyDppBankMask, false, nullptr},		{"bank_mask", AMDGPUOperand::ImmTyDppBankMask, false, nullptr},
{"bound_ctrl", AMDGPUOperand::ImmTyDppBoundCtrl, false, ConvertBoundCtrl},		{"bound_ctrl", AMDGPUOperand::ImmTyDppBoundCtrl, false, ConvertBoundCtrl},
{"fi", AMDGPUOperand::ImmTyDppFi, false, nullptr},		{"fi", AMDGPUOperand::ImmTyDppFi, false, nullptr},
▲ Show 20 Lines • Show All 964 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	private:
void printDim(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printDim(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printUNorm(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printUNorm(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printDA(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printDA(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printR128A16(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printR128A16(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
		void printGFX10A16(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
		raw_ostream &O);
void printLWE(const MCInst *MI, unsigned OpNo,		void printLWE(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
void printD16(const MCInst *MI, unsigned OpNo,		void printD16(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
void printExpCompr(const MCInst *MI, unsigned OpNo,		void printExpCompr(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
void printExpVM(const MCInst *MI, unsigned OpNo,		void printExpVM(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp

	Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
	void AMDGPUInstPrinter::printR128A16(const MCInst *MI, unsigned OpNo,			void AMDGPUInstPrinter::printR128A16(const MCInst *MI, unsigned OpNo,
	const MCSubtargetInfo &STI, raw_ostream &O) {			const MCSubtargetInfo &STI, raw_ostream &O) {
	if (STI.hasFeature(AMDGPU::FeatureR128A16))			if (STI.hasFeature(AMDGPU::FeatureR128A16))
	printNamedBit(MI, OpNo, O, "a16");			printNamedBit(MI, OpNo, O, "a16");
	else			else
	printNamedBit(MI, OpNo, O, "r128");			printNamedBit(MI, OpNo, O, "r128");
	}			}

				void AMDGPUInstPrinter::printGFX10A16(const MCInst *MI, unsigned OpNo,
				const MCSubtargetInfo &STI, raw_ostream &O) {
				printNamedBit(MI, OpNo, O, "a16");
				}

	void AMDGPUInstPrinter::printLWE(const MCInst *MI, unsigned OpNo,			void AMDGPUInstPrinter::printLWE(const MCInst *MI, unsigned OpNo,
	const MCSubtargetInfo &STI, raw_ostream &O) {			const MCSubtargetInfo &STI, raw_ostream &O) {
	printNamedBit(MI, OpNo, O, "lwe");			printNamedBit(MI, OpNo, O, "lwe");
	}			}

	void AMDGPUInstPrinter::printD16(const MCInst *MI, unsigned OpNo,			void AMDGPUInstPrinter::printD16(const MCInst *MI, unsigned OpNo,
	const MCSubtargetInfo &STI, raw_ostream &O) {			const MCSubtargetInfo &STI, raw_ostream &O) {
	printNamedBit(MI, OpNo, O, "d16");			printNamedBit(MI, OpNo, O, "d16");
	▲ Show 20 Lines • Show All 1,295 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/MIMGInstructions.td

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines
}		}

class MIMG_NoSampler_gfx10<int op, string opcode,		class MIMG_NoSampler_gfx10<int op, string opcode,
RegisterClass DataRC, RegisterClass AddrRC,		RegisterClass DataRC, RegisterClass AddrRC,
string dns="">		string dns="">
: MIMG_gfx10<op, (outs DataRC:$vdata), dns> {		: MIMG_gfx10<op, (outs DataRC:$vdata), dns> {
let InOperandList = !con((ins AddrRC:$vaddr0, SReg_256:$srsrc, DMask:$dmask,		let InOperandList = !con((ins AddrRC:$vaddr0, SReg_256:$srsrc, DMask:$dmask,
Dim:$dim, UNorm:$unorm, DLC:$dlc, GLC:$glc,		Dim:$dim, UNorm:$unorm, DLC:$dlc, GLC:$glc,
SLC:$slc, R128A16:$r128, TFE:$tfe, LWE:$lwe),		SLC:$slc, R128A16:$r128, GFX10A16:$a16, TFE:$tfe, LWE:$lwe),
!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));		!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));
let AsmString = opcode#" $vdata, $vaddr0, $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$tfe$lwe"		let AsmString = opcode#" $vdata, $vaddr0, $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$a16$tfe$lwe"
#!if(BaseOpcode.HasD16, "$d16", "");		#!if(BaseOpcode.HasD16, "$d16", "");
}		}

class MIMG_NoSampler_nsa_gfx10<int op, string opcode,		class MIMG_NoSampler_nsa_gfx10<int op, string opcode,
RegisterClass DataRC, int num_addrs,		RegisterClass DataRC, int num_addrs,
string dns="">		string dns="">
: MIMG_nsa_gfx10<op, (outs DataRC:$vdata), num_addrs, dns> {		: MIMG_nsa_gfx10<op, (outs DataRC:$vdata), num_addrs, dns> {
let InOperandList = !con(AddrIns,		let InOperandList = !con(AddrIns,
(ins SReg_256:$srsrc, DMask:$dmask,		(ins SReg_256:$srsrc, DMask:$dmask,
Dim:$dim, UNorm:$unorm, DLC:$dlc, GLC:$glc,		Dim:$dim, UNorm:$unorm, DLC:$dlc, GLC:$glc,
SLC:$slc, R128A16:$r128, TFE:$tfe, LWE:$lwe),		SLC:$slc, R128A16:$r128, GFX10A16:$a16, TFE:$tfe, LWE:$lwe),
!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));		!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));
let AsmString = opcode#" $vdata, "#AddrAsm#", $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$tfe$lwe"		let AsmString = opcode#" $vdata, "#AddrAsm#", $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$a16$tfe$lwe"
#!if(BaseOpcode.HasD16, "$d16", "");		#!if(BaseOpcode.HasD16, "$d16", "");
}		}

multiclass MIMG_NoSampler_Src_Helper <bits<8> op, string asm,		multiclass MIMG_NoSampler_Src_Helper <bits<8> op, string asm,
RegisterClass dst_rc,		RegisterClass dst_rc,
bit enableDisasm> {		bit enableDisasm> {
let ssamp = 0 in {		let ssamp = 0 in {
let VAddrDwords = 1 in {		let VAddrDwords = 1 in {
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
}		}

class MIMG_Store_gfx10<int op, string opcode,		class MIMG_Store_gfx10<int op, string opcode,
RegisterClass DataRC, RegisterClass AddrRC,		RegisterClass DataRC, RegisterClass AddrRC,
string dns="">		string dns="">
: MIMG_gfx10<op, (outs), dns> {		: MIMG_gfx10<op, (outs), dns> {
let InOperandList = !con((ins DataRC:$vdata, AddrRC:$vaddr0, SReg_256:$srsrc,		let InOperandList = !con((ins DataRC:$vdata, AddrRC:$vaddr0, SReg_256:$srsrc,
DMask:$dmask, Dim:$dim, UNorm:$unorm, DLC:$dlc,		DMask:$dmask, Dim:$dim, UNorm:$unorm, DLC:$dlc,
GLC:$glc, SLC:$slc, R128A16:$r128, TFE:$tfe, LWE:$lwe),		GLC:$glc, SLC:$slc, R128A16:$r128, GFX10A16:$a16, TFE:$tfe, LWE:$lwe),
!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));		!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));
let AsmString = opcode#" $vdata, $vaddr0, $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$tfe$lwe"		let AsmString = opcode#" $vdata, $vaddr0, $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$a16$tfe$lwe"
#!if(BaseOpcode.HasD16, "$d16", "");		#!if(BaseOpcode.HasD16, "$d16", "");
}		}

class MIMG_Store_nsa_gfx10<int op, string opcode,		class MIMG_Store_nsa_gfx10<int op, string opcode,
RegisterClass DataRC, int num_addrs,		RegisterClass DataRC, int num_addrs,
string dns="">		string dns="">
: MIMG_nsa_gfx10<op, (outs), num_addrs, dns> {		: MIMG_nsa_gfx10<op, (outs), num_addrs, dns> {
let InOperandList = !con((ins DataRC:$vdata),		let InOperandList = !con((ins DataRC:$vdata),
AddrIns,		AddrIns,
(ins SReg_256:$srsrc, DMask:$dmask,		(ins SReg_256:$srsrc, DMask:$dmask,
Dim:$dim, UNorm:$unorm, DLC:$dlc, GLC:$glc,		Dim:$dim, UNorm:$unorm, DLC:$dlc, GLC:$glc,
SLC:$slc, R128A16:$r128, TFE:$tfe, LWE:$lwe),		SLC:$slc, R128A16:$r128, GFX10A16:$a16, TFE:$tfe, LWE:$lwe),
!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));		!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));
let AsmString = opcode#" $vdata, "#AddrAsm#", $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$tfe$lwe"		let AsmString = opcode#" $vdata, "#AddrAsm#", $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$a16$tfe$lwe"
#!if(BaseOpcode.HasD16, "$d16", "");		#!if(BaseOpcode.HasD16, "$d16", "");
}		}

multiclass MIMG_Store_Addr_Helper <int op, string asm,		multiclass MIMG_Store_Addr_Helper <int op, string asm,
RegisterClass data_rc,		RegisterClass data_rc,
bit enableDisasm> {		bit enableDisasm> {
let mayLoad = 0, mayStore = 1, hasSideEffects = 0, hasPostISelHook = 0,		let mayLoad = 0, mayStore = 1, hasSideEffects = 0, hasPostISelHook = 0,
DisableWQM = 1, ssamp = 0 in {		DisableWQM = 1, ssamp = 0 in {
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	class MIMG_Atomic_gfx10<mimg op, string opcode,
bit enableDisasm = 0>		bit enableDisasm = 0>
: MIMG_gfx10<!cast<int>(op.SI_GFX10), (outs DataRC:$vdst),		: MIMG_gfx10<!cast<int>(op.SI_GFX10), (outs DataRC:$vdst),
!if(enableDisasm, "AMDGPU", "")> {		!if(enableDisasm, "AMDGPU", "")> {
let Constraints = "$vdst = $vdata";		let Constraints = "$vdst = $vdata";
let AsmMatchConverter = "cvtMIMGAtomic";		let AsmMatchConverter = "cvtMIMGAtomic";

let InOperandList = (ins DataRC:$vdata, AddrRC:$vaddr0, SReg_256:$srsrc,		let InOperandList = (ins DataRC:$vdata, AddrRC:$vaddr0, SReg_256:$srsrc,
DMask:$dmask, Dim:$dim, UNorm:$unorm, DLC:$dlc,		DMask:$dmask, Dim:$dim, UNorm:$unorm, DLC:$dlc,
GLC:$glc, SLC:$slc, R128A16:$r128, TFE:$tfe, LWE:$lwe);		GLC:$glc, SLC:$slc, R128A16:$r128, GFX10A16:$a16, TFE:$tfe, LWE:$lwe);
let AsmString = opcode#" $vdst, $vaddr0, $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$tfe$lwe";		let AsmString = opcode#" $vdst, $vaddr0, $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$a16$tfe$lwe";
}		}

class MIMG_Atomic_nsa_gfx10<mimg op, string opcode,		class MIMG_Atomic_nsa_gfx10<mimg op, string opcode,
RegisterClass DataRC, int num_addrs,		RegisterClass DataRC, int num_addrs,
bit enableDisasm = 0>		bit enableDisasm = 0>
: MIMG_nsa_gfx10<!cast<int>(op.SI_GFX10), (outs DataRC:$vdst), num_addrs,		: MIMG_nsa_gfx10<!cast<int>(op.SI_GFX10), (outs DataRC:$vdst), num_addrs,
!if(enableDisasm, "AMDGPU", "")> {		!if(enableDisasm, "AMDGPU", "")> {
let Constraints = "$vdst = $vdata";		let Constraints = "$vdst = $vdata";
let AsmMatchConverter = "cvtMIMGAtomic";		let AsmMatchConverter = "cvtMIMGAtomic";

let InOperandList = !con((ins DataRC:$vdata),		let InOperandList = !con((ins DataRC:$vdata),
AddrIns,		AddrIns,
(ins SReg_256:$srsrc, DMask:$dmask,		(ins SReg_256:$srsrc, DMask:$dmask,
Dim:$dim, UNorm:$unorm, DLC:$dlc, GLC:$glc,		Dim:$dim, UNorm:$unorm, DLC:$dlc, GLC:$glc,
SLC:$slc, R128A16:$r128, TFE:$tfe, LWE:$lwe));		SLC:$slc, R128A16:$r128, GFX10A16:$a16, TFE:$tfe, LWE:$lwe));
let AsmString = opcode#" $vdata, "#AddrAsm#", $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$tfe$lwe";		let AsmString = opcode#" $vdata, "#AddrAsm#", $srsrc$dmask$dim$unorm$dlc$glc$slc$r128$a16$tfe$lwe";
}		}

multiclass MIMG_Atomic_Addr_Helper_m <mimg op, string asm,		multiclass MIMG_Atomic_Addr_Helper_m <mimg op, string asm,
RegisterClass data_rc,		RegisterClass data_rc,
bit enableDasm = 0> {		bit enableDasm = 0> {
let hasSideEffects = 1, // FIXME: remove this		let hasSideEffects = 1, // FIXME: remove this
mayLoad = 1, mayStore = 1, hasPostISelHook = 0, DisableWQM = 1,		mayLoad = 1, mayStore = 1, hasPostISelHook = 0, DisableWQM = 1,
ssamp = 0 in {		ssamp = 0 in {
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
}		}

class MIMG_Sampler_gfx10<int op, string opcode,		class MIMG_Sampler_gfx10<int op, string opcode,
RegisterClass DataRC, RegisterClass AddrRC,		RegisterClass DataRC, RegisterClass AddrRC,
string dns="">		string dns="">
: MIMG_gfx10<op, (outs DataRC:$vdata), dns> {		: MIMG_gfx10<op, (outs DataRC:$vdata), dns> {
let InOperandList = !con((ins AddrRC:$vaddr0, SReg_256:$srsrc, SReg_128:$ssamp,		let InOperandList = !con((ins AddrRC:$vaddr0, SReg_256:$srsrc, SReg_128:$ssamp,
DMask:$dmask, Dim:$dim, UNorm:$unorm, DLC:$dlc,		DMask:$dmask, Dim:$dim, UNorm:$unorm, DLC:$dlc,
GLC:$glc, SLC:$slc, R128A16:$r128, TFE:$tfe, LWE:$lwe),		GLC:$glc, SLC:$slc, R128A16:$r128, GFX10A16:$a16, TFE:$tfe, LWE:$lwe),
!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));		!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));
let AsmString = opcode#" $vdata, $vaddr0, $srsrc, $ssamp$dmask$dim$unorm"		let AsmString = opcode#" $vdata, $vaddr0, $srsrc, $ssamp$dmask$dim$unorm"
#"$dlc$glc$slc$r128$tfe$lwe"		#"$dlc$glc$slc$r128$a16$tfe$lwe"
#!if(BaseOpcode.HasD16, "$d16", "");		#!if(BaseOpcode.HasD16, "$d16", "");
}		}

class MIMG_Sampler_nsa_gfx10<int op, string opcode,		class MIMG_Sampler_nsa_gfx10<int op, string opcode,
RegisterClass DataRC, int num_addrs,		RegisterClass DataRC, int num_addrs,
string dns="">		string dns="">
: MIMG_nsa_gfx10<op, (outs DataRC:$vdata), num_addrs, dns> {		: MIMG_nsa_gfx10<op, (outs DataRC:$vdata), num_addrs, dns> {
let InOperandList = !con(AddrIns,		let InOperandList = !con(AddrIns,
(ins SReg_256:$srsrc, SReg_128:$ssamp, DMask:$dmask,		(ins SReg_256:$srsrc, SReg_128:$ssamp, DMask:$dmask,
Dim:$dim, UNorm:$unorm, DLC:$dlc, GLC:$glc,		Dim:$dim, UNorm:$unorm, DLC:$dlc, GLC:$glc,
SLC:$slc, R128A16:$r128, TFE:$tfe, LWE:$lwe),		SLC:$slc, R128A16:$r128, GFX10A16:$a16, TFE:$tfe, LWE:$lwe),
!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));		!if(BaseOpcode.HasD16, (ins D16:$d16), (ins)));
let AsmString = opcode#" $vdata, "#AddrAsm#", $srsrc, $ssamp$dmask$dim$unorm"		let AsmString = opcode#" $vdata, "#AddrAsm#", $srsrc, $ssamp$dmask$dim$unorm"
#"$dlc$glc$slc$r128$tfe$lwe"		#"$dlc$glc$slc$r128$a16$tfe$lwe"
#!if(BaseOpcode.HasD16, "$d16", "");		#!if(BaseOpcode.HasD16, "$d16", "");
}		}

class MIMGAddrSize<int dw, bit enable_disasm> {		class MIMGAddrSize<int dw, bit enable_disasm> {
int NumWords = dw;		int NumWords = dw;

RegisterClass RegClass = !if(!le(NumWords, 0), ?,		RegisterClass RegClass = !if(!le(NumWords, 0), ?,
!if(!eq(NumWords, 1), VGPR_32,		!if(!eq(NumWords, 1), VGPR_32,
▲ Show 20 Lines • Show All 287 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,395 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::lowerImage(SDValue Op,
}		}

// Check for 16 bit addresses and pack if true.		// Check for 16 bit addresses and pack if true.
unsigned DimIdx = AddrIdx + BaseOpcode->NumExtraArgs;		unsigned DimIdx = AddrIdx + BaseOpcode->NumExtraArgs;
MVT VAddrVT = Op.getOperand(DimIdx).getSimpleValueType();		MVT VAddrVT = Op.getOperand(DimIdx).getSimpleValueType();
const MVT VAddrScalarVT = VAddrVT.getScalarType();		const MVT VAddrScalarVT = VAddrVT.getScalarType();
if (((VAddrScalarVT == MVT::f16) \|\| (VAddrScalarVT == MVT::i16))) {		if (((VAddrScalarVT == MVT::f16) \|\| (VAddrScalarVT == MVT::i16))) {
// Illegal to use a16 images		// Illegal to use a16 images
if (!ST->hasFeature(AMDGPU::FeatureR128A16))		if (!ST->hasFeature(AMDGPU::FeatureR128A16) && !ST->hasFeature(AMDGPU::FeatureGFX10A16))
return Op;		return Op;

IsA16 = true;		IsA16 = true;
const MVT VectorVT = VAddrScalarVT == MVT::f16 ? MVT::v2f16 : MVT::v2i16;		const MVT VectorVT = VAddrScalarVT == MVT::f16 ? MVT::v2f16 : MVT::v2i16;
for (unsigned i = AddrIdx; i < (AddrIdx + NumMIVAddrs); ++i) {		for (unsigned i = AddrIdx; i < (AddrIdx + NumMIVAddrs); ++i) {
SDValue AddrLo;		SDValue AddrLo;
// Push back extra arguments.		// Push back extra arguments.
if (i < DimIdx) {		if (i < DimIdx) {
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::lowerImage(SDValue Op,
Ops.push_back(DAG.getTargetConstant(DMask, DL, MVT::i32));		Ops.push_back(DAG.getTargetConstant(DMask, DL, MVT::i32));
if (IsGFX10)		if (IsGFX10)
Ops.push_back(DAG.getTargetConstant(DimInfo->Encoding, DL, MVT::i32));		Ops.push_back(DAG.getTargetConstant(DimInfo->Encoding, DL, MVT::i32));
Ops.push_back(Unorm);		Ops.push_back(Unorm);
if (IsGFX10)		if (IsGFX10)
Ops.push_back(DLC);		Ops.push_back(DLC);
Ops.push_back(GLC);		Ops.push_back(GLC);
Ops.push_back(SLC);		Ops.push_back(SLC);
Ops.push_back(IsA16 && // a16 or r128		Ops.push_back(IsA16 && // r128, a16 for gfx9
ST->hasFeature(AMDGPU::FeatureR128A16) ? True : False);		ST->hasFeature(AMDGPU::FeatureR128A16) ? True : False);
Ops.push_back(TFE); // tfe		if (IsGFX10)
Ops.push_back(LWE); // lwe		Ops.push_back(IsA16 ? True : False);
		Ops.push_back(TFE);
		Ops.push_back(LWE);
if (!IsGFX10)		if (!IsGFX10)
Ops.push_back(DimInfo->DA ? True : False);		Ops.push_back(DimInfo->DA ? True : False);
if (BaseOpcode->HasD16)		if (BaseOpcode->HasD16)
Ops.push_back(IsD16 ? True : False);		Ops.push_back(IsD16 ? True : False);
if (isa<MemSDNode>(Op))		if (isa<MemSDNode>(Op))
Ops.push_back(Op.getOperand(0)); // chain		Ops.push_back(Op.getOperand(0)); // chain

int NumVAddrDwords =		int NumVAddrDwords =
▲ Show 20 Lines • Show All 5,403 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrFormats.td

Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	class MIMGe_gfx6789 <bits<8> op> : MIMGe {
let Inst{39-32} = vaddr;		let Inst{39-32} = vaddr;
}		}

class MIMGe_gfx10 <bits<8> op> : MIMGe {		class MIMGe_gfx10 <bits<8> op> : MIMGe {
bits<8> vaddr0;		bits<8> vaddr0;
bits<3> dim;		bits<3> dim;
bits<2> nsa;		bits<2> nsa;
bits<1> dlc;		bits<1> dlc;
bits<1> a16 = 0; // TODO: this should be an operand		bits<1> a16;

let Inst{0} = op{7};		let Inst{0} = op{7};
let Inst{2-1} = nsa;		let Inst{2-1} = nsa;
let Inst{5-3} = dim;		let Inst{5-3} = dim;
let Inst{7} = dlc;		let Inst{7} = dlc;
let Inst{24-18} = op{6-0};		let Inst{24-18} = op{6-0};
let Inst{39-32} = vaddr0;		let Inst{39-32} = vaddr0;
let Inst{62} = a16;		let Inst{62} = a16;
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 3,736 Lines • ▼ Show 20 Lines	if (DimOp) {
const AMDGPU::MIMGDimInfo *Dim =		const AMDGPU::MIMGDimInfo *Dim =
AMDGPU::getMIMGDimInfoByEncoding(DimOp->getImm());		AMDGPU::getMIMGDimInfoByEncoding(DimOp->getImm());

if (!Dim) {		if (!Dim) {
ErrInfo = "dim is out of range";		ErrInfo = "dim is out of range";
return false;		return false;
}		}

		bool IsA16 = false;
		if (ST.hasR128A16()) {
		const MachineOperand *R128A16 = getNamedOperand(MI, AMDGPU::OpName::r128);
		IsA16 = R128A16->getImm() != 0;
		} else if (ST.hasGFX10A16()) {
		const MachineOperand *A16 = getNamedOperand(MI, AMDGPU::OpName::a16);
		IsA16 = A16->getImm() != 0;
		}

		bool PackDerivatives = IsA16; // Either A16 or G16
bool IsNSA = SRsrcIdx - VAddr0Idx > 1;		bool IsNSA = SRsrcIdx - VAddr0Idx > 1;
unsigned AddrWords = BaseOpcode->NumExtraArgs +
(BaseOpcode->Gradients ? Dim->NumGradients : 0) +		unsigned AddrWords = BaseOpcode->NumExtraArgs;
(BaseOpcode->Coordinates ? Dim->NumCoords : 0) +		unsigned AddrComponents = (BaseOpcode->Coordinates ? Dim->NumCoords : 0) +
(BaseOpcode->LodOrClampOrMip ? 1 : 0);		(BaseOpcode->LodOrClampOrMip ? 1 : 0);
		if (IsA16)
		AddrWords += (AddrComponents + 1) / 2;
		else
		AddrWords += AddrComponents;

		if (BaseOpcode->Gradients) {
		if (PackDerivatives)
		// There are two gradients per coordinate, we pack them separately.
		// For the 3d case, we get (dy/du, dx/du) (-, dz/du) (dy/dv, dx/dv) (-, dz/dv)
		AddrWords += (Dim->NumGradients / 2 + 1) / 2 * 2;
		else
		AddrWords += Dim->NumGradients;
		}

unsigned VAddrWords;		unsigned VAddrWords;
if (IsNSA) {		if (IsNSA) {
VAddrWords = SRsrcIdx - VAddr0Idx;		VAddrWords = SRsrcIdx - VAddr0Idx;
} else {		} else {
const TargetRegisterClass *RC = getOpRegClass(MI, VAddr0Idx);		const TargetRegisterClass *RC = getOpRegClass(MI, VAddr0Idx);
VAddrWords = MRI.getTargetRegisterInfo()->getRegSizeInBits(*RC) / 32;		VAddrWords = MRI.getTargetRegisterInfo()->getRegSizeInBits(*RC) / 32;
if (AddrWords > 8)		if (AddrWords > 8)
AddrWords = 16;		AddrWords = 16;
else if (AddrWords > 4)		else if (AddrWords > 4)
AddrWords = 8;		AddrWords = 8;
else if (AddrWords == 3 && VAddrWords == 4) {		else if (AddrWords == 4)
// CodeGen uses the V4 variant of instructions for three addresses,
// because the selection DAG does not support non-power-of-two types.
AddrWords = 4;		AddrWords = 4;
}		else if (AddrWords == 3)
		AddrWords = 3;
}		}

if (VAddrWords != AddrWords) {		if (VAddrWords != AddrWords) {
ErrInfo = "bad vaddr size";		ErrInfo = "bad vaddr size";
return false;		return false;
}		}
}		}
}		}
▲ Show 20 Lines • Show All 2,949 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.td

	Show First 20 Lines • Show All 1,084 Lines • ▼ Show 20 Lines
	def DLC : NamedOperandBit<"DLC", NamedMatchClass<"DLC">>;			def DLC : NamedOperandBit<"DLC", NamedMatchClass<"DLC">>;
	def GLC : NamedOperandBit<"GLC", NamedMatchClass<"GLC">>;			def GLC : NamedOperandBit<"GLC", NamedMatchClass<"GLC">>;
	def SLC : NamedOperandBit<"SLC", NamedMatchClass<"SLC">>;			def SLC : NamedOperandBit<"SLC", NamedMatchClass<"SLC">>;
	def TFE : NamedOperandBit<"TFE", NamedMatchClass<"TFE">>;			def TFE : NamedOperandBit<"TFE", NamedMatchClass<"TFE">>;
	def SWZ : NamedOperandBit<"SWZ", NamedMatchClass<"SWZ">>;			def SWZ : NamedOperandBit<"SWZ", NamedMatchClass<"SWZ">>;
	def UNorm : NamedOperandBit<"UNorm", NamedMatchClass<"UNorm">>;			def UNorm : NamedOperandBit<"UNorm", NamedMatchClass<"UNorm">>;
	def DA : NamedOperandBit<"DA", NamedMatchClass<"DA">>;			def DA : NamedOperandBit<"DA", NamedMatchClass<"DA">>;
	def R128A16 : NamedOperandBit<"R128A16", NamedMatchClass<"R128A16">>;			def R128A16 : NamedOperandBit<"R128A16", NamedMatchClass<"R128A16">>;
				def GFX10A16 : NamedOperandBit<"GFX10A16", NamedMatchClass<"GFX10A16">>;
	def D16 : NamedOperandBit<"D16", NamedMatchClass<"D16">>;			def D16 : NamedOperandBit<"D16", NamedMatchClass<"D16">>;
	def LWE : NamedOperandBit<"LWE", NamedMatchClass<"LWE">>;			def LWE : NamedOperandBit<"LWE", NamedMatchClass<"LWE">>;
	def exp_compr : NamedOperandBit<"ExpCompr", NamedMatchClass<"ExpCompr">>;			def exp_compr : NamedOperandBit<"ExpCompr", NamedMatchClass<"ExpCompr">>;
	def exp_vm : NamedOperandBit<"ExpVM", NamedMatchClass<"ExpVM">>;			def exp_vm : NamedOperandBit<"ExpVM", NamedMatchClass<"ExpVM">>;

	def FORMAT : NamedOperandU8<"FORMAT", NamedMatchClass<"FORMAT">>;			def FORMAT : NamedOperandU8<"FORMAT", NamedMatchClass<"FORMAT">>;

	def DMask : NamedOperandU16<"DMask", NamedMatchClass<"DMask">>;			def DMask : NamedOperandU16<"DMask", NamedMatchClass<"DMask">>;
	▲ Show 20 Lines • Show All 1,460 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp

Show First 20 Lines • Show All 680 Lines • ▼ Show 20 Lines	bool SILoadStoreOptimizer::dmasksCanBeCombined(const CombineInfo &CI,
const auto LWEOp = TII.getNamedOperand(CI.I, AMDGPU::OpName::lwe);		const auto LWEOp = TII.getNamedOperand(CI.I, AMDGPU::OpName::lwe);

if ((TFEOp && TFEOp->getImm()) \|\| (LWEOp && LWEOp->getImm()))		if ((TFEOp && TFEOp->getImm()) \|\| (LWEOp && LWEOp->getImm()))
return false;		return false;

// Check other optional immediate operands for equality.		// Check other optional immediate operands for equality.
unsigned OperandsToMatch[] = {AMDGPU::OpName::glc, AMDGPU::OpName::slc,		unsigned OperandsToMatch[] = {AMDGPU::OpName::glc, AMDGPU::OpName::slc,
AMDGPU::OpName::d16, AMDGPU::OpName::unorm,		AMDGPU::OpName::d16, AMDGPU::OpName::unorm,
AMDGPU::OpName::da, AMDGPU::OpName::r128};		AMDGPU::OpName::da, AMDGPU::OpName::r128,
		AMDGPU::OpName::a16};

for (auto op : OperandsToMatch) {		for (auto op : OperandsToMatch) {
int Idx = AMDGPU::getNamedOperandIdx(CI.I->getOpcode(), op);		int Idx = AMDGPU::getNamedOperandIdx(CI.I->getOpcode(), op);
if (AMDGPU::getNamedOperandIdx(Paired.I->getOpcode(), op) != Idx)		if (AMDGPU::getNamedOperandIdx(Paired.I->getOpcode(), op) != Idx)
return false;		return false;
if (Idx != -1 &&		if (Idx != -1 &&
CI.I->getOperand(Idx).getImm() != Paired.I->getOperand(Idx).getImm())		CI.I->getOperand(Idx).getImm() != Paired.I->getOperand(Idx).getImm())
return false;		return false;
▲ Show 20 Lines • Show All 1,469 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

Show First 20 Lines • Show All 545 Lines • ▼ Show 20 Lines	inline bool isKernel(CallingConv::ID CC) {
default:		default:
return false;		return false;
}		}
}		}

bool hasXNACK(const MCSubtargetInfo &STI);		bool hasXNACK(const MCSubtargetInfo &STI);
bool hasSRAMECC(const MCSubtargetInfo &STI);		bool hasSRAMECC(const MCSubtargetInfo &STI);
bool hasMIMG_R128(const MCSubtargetInfo &STI);		bool hasMIMG_R128(const MCSubtargetInfo &STI);
		bool hasGFX10A16(const MCSubtargetInfo &STI);
bool hasPackedD16(const MCSubtargetInfo &STI);		bool hasPackedD16(const MCSubtargetInfo &STI);

bool isSI(const MCSubtargetInfo &STI);		bool isSI(const MCSubtargetInfo &STI);
bool isCI(const MCSubtargetInfo &STI);		bool isCI(const MCSubtargetInfo &STI);
bool isVI(const MCSubtargetInfo &STI);		bool isVI(const MCSubtargetInfo &STI);
bool isGFX9(const MCSubtargetInfo &STI);		bool isGFX9(const MCSubtargetInfo &STI);
bool isGFX10(const MCSubtargetInfo &STI);		bool isGFX10(const MCSubtargetInfo &STI);

▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

Show First 20 Lines • Show All 921 Lines • ▼ Show 20 Lines	bool hasXNACK(const MCSubtargetInfo &STI) {
return STI.getFeatureBits()[AMDGPU::FeatureXNACK];		return STI.getFeatureBits()[AMDGPU::FeatureXNACK];
}		}

bool hasSRAMECC(const MCSubtargetInfo &STI) {		bool hasSRAMECC(const MCSubtargetInfo &STI) {
return STI.getFeatureBits()[AMDGPU::FeatureSRAMECC];		return STI.getFeatureBits()[AMDGPU::FeatureSRAMECC];
}		}

bool hasMIMG_R128(const MCSubtargetInfo &STI) {		bool hasMIMG_R128(const MCSubtargetInfo &STI) {
return STI.getFeatureBits()[AMDGPU::FeatureMIMG_R128];		return STI.getFeatureBits()[AMDGPU::FeatureMIMG_R128] && !STI.getFeatureBits()[AMDGPU::FeatureR128A16];
		}

		bool hasGFX10A16(const MCSubtargetInfo &STI) {
		return STI.getFeatureBits()[AMDGPU::FeatureGFX10A16];
}		}

bool hasPackedD16(const MCSubtargetInfo &STI) {		bool hasPackedD16(const MCSubtargetInfo &STI) {
return !STI.getFeatureBits()[AMDGPU::FeatureUnpackedD16VMem];		return !STI.getFeatureBits()[AMDGPU::FeatureUnpackedD16VMem];
}		}

bool isSI(const MCSubtargetInfo &STI) {		bool isSI(const MCSubtargetInfo &STI) {
return STI.getFeatureBits()[AMDGPU::FeatureSouthernIslands];		return STI.getFeatureBits()[AMDGPU::FeatureSouthernIslands];
▲ Show 20 Lines • Show All 452 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.dim.ll

	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s

	; GCN-LABEL: {{^}}load_1d:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16
	define amdgpu_ps <4 x float> @load_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_2d:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16
	define amdgpu_ps <4 x float> @load_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_2d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%t = extractelement <2 x i16> %coords, i32 1			%t = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 %s, i16 %t, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 %s, i16 %t, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_3d:
	; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16
	define amdgpu_ps <4 x float> @load_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_3d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%r = extractelement <2 x i16> %coords_hi, i32 0			%r = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %r, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %r, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_cube:
	; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps <4 x float> @load_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_cube:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_CUBE unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%slice = extractelement <2 x i16> %coords_hi, i32 0			%slice = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_1darray:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps <4 x float> @load_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%slice = extractelement <2 x i16> %coords, i32 1			%slice = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i16(i32 15, i16 %s, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i16(i32 15, i16 %s, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_2darray:
	; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps <4 x float> @load_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_2darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%slice = extractelement <2 x i16> %coords_hi, i32 0			%slice = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_2dmsaa:
	; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16
	define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_2dmsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_2dmsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%fragid = extractelement <2 x i16> %coords_hi, i32 0			%fragid = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.2dmsaa.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2dmsaa.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_2darraymsaa:
	; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_2darraymsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_2darraymsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%slice = extractelement <2 x i16> %coords_hi, i32 0			%slice = extractelement <2 x i16> %coords_hi, i32 0
	%fragid = extractelement <2 x i16> %coords_hi, i32 1			%fragid = extractelement <2 x i16> %coords_hi, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.2darraymsaa.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2darraymsaa.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_mip_1d:
	; GCN: image_load_mip v[0:3], v0, s[0:7] dmask:0xf unorm a16
	define amdgpu_ps <4 x float> @load_mip_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load_mip_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_mip_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v0, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%mip = extractelement <2 x i16> %coords, i32 1			%mip = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i16(i32 15, i16 %s, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i16(i32 15, i16 %s, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_mip_2d:
	; GCN: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16
	define amdgpu_ps <4 x float> @load_mip_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load_mip_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_mip_2d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%mip = extractelement <2 x i16> %coords_hi, i32 0			%mip = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_mip_3d:
	; GCN: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16
	define amdgpu_ps <4 x float> @load_mip_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load_mip_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_mip_3d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%r = extractelement <2 x i16> %coords_hi, i32 0			%r = extractelement <2 x i16> %coords_hi, i32 0
	%mip = extractelement <2 x i16> %coords_hi, i32 1			%mip = extractelement <2 x i16> %coords_hi, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %r, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %r, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_mip_cube:
	; GCN: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps <4 x float> @load_mip_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load_mip_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_mip_cube:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_CUBE unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%slice = extractelement <2 x i16> %coords_hi, i32 0			%slice = extractelement <2 x i16> %coords_hi, i32 0
	%mip = extractelement <2 x i16> %coords_hi, i32 1			%mip = extractelement <2 x i16> %coords_hi, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.cube.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.cube.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_mip_1darray:
	; GCN: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps <4 x float> @load_mip_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load_mip_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_mip_1darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_1darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%slice = extractelement <2 x i16> %coords_lo, i32 1			%slice = extractelement <2 x i16> %coords_lo, i32 1
	%mip = extractelement <2 x i16> %coords_hi, i32 0			%mip = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.1darray.v4f32.i16(i32 15, i16 %s, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.1darray.v4f32.i16(i32 15, i16 %s, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_mip_2darray:
	; GCN: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps <4 x float> @load_mip_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load_mip_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_mip_2darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%slice = extractelement <2 x i16> %coords_hi, i32 0			%slice = extractelement <2 x i16> %coords_hi, i32 0
	%mip = extractelement <2 x i16> %coords_hi, i32 1			%mip = extractelement <2 x i16> %coords_hi, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.2darray.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.2darray.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}store_1d:
	; GCN: image_store v[0:3], v4, s[0:7] dmask:0xf unorm a16
	define amdgpu_ps void @store_1d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {			define amdgpu_ps void @store_1d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_2d:
	; GCN: image_store v[0:3], v4, s[0:7] dmask:0xf unorm a16
	define amdgpu_ps void @store_2d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {			define amdgpu_ps void @store_2d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_2d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%t = extractelement <2 x i16> %coords, i32 1			%t = extractelement <2 x i16> %coords, i32 1
	call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_3d:
	; GCN: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16
	define amdgpu_ps void @store_3d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps void @store_3d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_3d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%r = extractelement <2 x i16> %coords_hi, i32 0			%r = extractelement <2 x i16> %coords_hi, i32 0
	call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %r, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %r, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_cube:
	; GCN: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps void @store_cube(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps void @store_cube(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_cube:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_CUBE unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%slice = extractelement <2 x i16> %coords_hi, i32 0			%slice = extractelement <2 x i16> %coords_hi, i32 0
	call void @llvm.amdgcn.image.store.cube.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.cube.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_1darray:
	; GCN: image_store v[0:3], v4, s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps void @store_1darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {			define amdgpu_ps void @store_1darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_1darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%slice = extractelement <2 x i16> %coords, i32 1			%slice = extractelement <2 x i16> %coords, i32 1
	call void @llvm.amdgcn.image.store.1darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_2darray:
	; GCN: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps void @store_2darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps void @store_2darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_2darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%slice = extractelement <2 x i16> %coords_hi, i32 0			%slice = extractelement <2 x i16> %coords_hi, i32 0
	call void @llvm.amdgcn.image.store.2darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_2dmsaa:
	; GCN: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16
	define amdgpu_ps void @store_2dmsaa(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps void @store_2dmsaa(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_2dmsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_2dmsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%fragid = extractelement <2 x i16> %coords_hi, i32 0			%fragid = extractelement <2 x i16> %coords_hi, i32 0
	call void @llvm.amdgcn.image.store.2dmsaa.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2dmsaa.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_2darraymsaa:
	; GCN: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps void @store_2darraymsaa(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps void @store_2darraymsaa(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_2darraymsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_2darraymsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%slice = extractelement <2 x i16> %coords_hi, i32 0			%slice = extractelement <2 x i16> %coords_hi, i32 0
	%fragid = extractelement <2 x i16> %coords_hi, i32 1			%fragid = extractelement <2 x i16> %coords_hi, i32 1
	call void @llvm.amdgcn.image.store.2darraymsaa.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2darraymsaa.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_mip_1d:
	; GCN: image_store_mip v[0:3], v4, s[0:7] dmask:0xf unorm a16
	define amdgpu_ps void @store_mip_1d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {			define amdgpu_ps void @store_mip_1d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_mip_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v4, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_mip_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%mip = extractelement <2 x i16> %coords, i32 1			%mip = extractelement <2 x i16> %coords, i32 1
	call void @llvm.amdgcn.image.store.mip.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_mip_2d:
	; GCN: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16
	define amdgpu_ps void @store_mip_2d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps void @store_mip_2d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_mip_2d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_mip_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%mip = extractelement <2 x i16> %coords_hi, i32 0			%mip = extractelement <2 x i16> %coords_hi, i32 0
	call void @llvm.amdgcn.image.store.mip.2d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.2d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_mip_3d:
	; GCN: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16
	define amdgpu_ps void @store_mip_3d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps void @store_mip_3d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_mip_3d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_mip_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%r = extractelement <2 x i16> %coords_hi, i32 0			%r = extractelement <2 x i16> %coords_hi, i32 0
	%mip = extractelement <2 x i16> %coords_hi, i32 1			%mip = extractelement <2 x i16> %coords_hi, i32 1
	call void @llvm.amdgcn.image.store.mip.3d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %r, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.3d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %r, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_mip_cube:
	; GCN: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps void @store_mip_cube(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps void @store_mip_cube(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_mip_cube:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_mip_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_CUBE unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%slice = extractelement <2 x i16> %coords_hi, i32 0			%slice = extractelement <2 x i16> %coords_hi, i32 0
	%mip = extractelement <2 x i16> %coords_hi, i32 1			%mip = extractelement <2 x i16> %coords_hi, i32 1
	call void @llvm.amdgcn.image.store.mip.cube.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.cube.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_mip_1darray:
	; GCN: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps void @store_mip_1darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps void @store_mip_1darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_mip_1darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_mip_1darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%slice = extractelement <2 x i16> %coords_lo, i32 1			%slice = extractelement <2 x i16> %coords_lo, i32 1
	%mip = extractelement <2 x i16> %coords_hi, i32 0			%mip = extractelement <2 x i16> %coords_hi, i32 0
	call void @llvm.amdgcn.image.store.mip.1darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.1darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_mip_2darray:
	; GCN: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps void @store_mip_2darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps void @store_mip_2darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_mip_2darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_mip_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords_lo, i32 0			%s = extractelement <2 x i16> %coords_lo, i32 0
	%t = extractelement <2 x i16> %coords_lo, i32 1			%t = extractelement <2 x i16> %coords_lo, i32 1
	%slice = extractelement <2 x i16> %coords_hi, i32 0			%slice = extractelement <2 x i16> %coords_hi, i32 0
	%mip = extractelement <2 x i16> %coords_hi, i32 1			%mip = extractelement <2 x i16> %coords_hi, i32 1
	call void @llvm.amdgcn.image.store.mip.2darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.2darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}getresinfo_1d:
	; GCN: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16
	define amdgpu_ps <4 x float> @getresinfo_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @getresinfo_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%mip = extractelement <2 x i16> %coords, i32 0			%mip = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.getresinfo.1d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.getresinfo.1d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}getresinfo_2d:
	; GCN: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16
	define amdgpu_ps <4 x float> @getresinfo_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @getresinfo_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_2d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%mip = extractelement <2 x i16> %coords, i32 0			%mip = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}getresinfo_3d:
	; GCN: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16
	define amdgpu_ps <4 x float> @getresinfo_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @getresinfo_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_3d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%mip = extractelement <2 x i16> %coords, i32 0			%mip = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.getresinfo.3d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.getresinfo.3d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}getresinfo_cube:
	; GCN: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps <4 x float> @getresinfo_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @getresinfo_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_cube:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_CUBE unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%mip = extractelement <2 x i16> %coords, i32 0			%mip = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.getresinfo.cube.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.getresinfo.cube.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}getresinfo_1darray:
	; GCN: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps <4 x float> @getresinfo_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @getresinfo_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_1darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_1darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%mip = extractelement <2 x i16> %coords, i32 0			%mip = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.getresinfo.1darray.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.getresinfo.1darray.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}getresinfo_2darray:
	; GCN: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps <4 x float> @getresinfo_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @getresinfo_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_2darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%mip = extractelement <2 x i16> %coords, i32 0			%mip = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2darray.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2darray.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}getresinfo_2dmsaa:
	; GCN: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16
	define amdgpu_ps <4 x float> @getresinfo_2dmsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @getresinfo_2dmsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_2dmsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_2dmsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%mip = extractelement <2 x i16> %coords, i32 0			%mip = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2dmsaa.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2dmsaa.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}getresinfo_2darraymsaa:
	; GCN: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da{{$}}
	define amdgpu_ps <4 x float> @getresinfo_2darraymsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @getresinfo_2darraymsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_2darraymsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_2darraymsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%mip = extractelement <2 x i16> %coords, i32 0			%mip = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2darraymsaa.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2darraymsaa.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_1d_V1:
	; GCN: image_load v0, v0, s[0:7] dmask:0x8 unorm a16
	define amdgpu_ps float @load_1d_V1(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps float @load_1d_V1(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d_V1:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v0, v0, s[0:7] dmask:0x8 unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d_V1:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v0, v0, s[0:7] dmask:0x8 dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%v = call float @llvm.amdgcn.image.load.1d.f32.i16(i32 8, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)			%v = call float @llvm.amdgcn.image.load.1d.f32.i16(i32 8, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
	ret float %v			ret float %v
	}			}

	; GCN-LABEL: {{^}}load_1d_V2:
	; GCN: image_load v[0:1], v0, s[0:7] dmask:0x9 unorm a16
	define amdgpu_ps <2 x float> @load_1d_V2(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <2 x float> @load_1d_V2(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d_V2:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:1], v0, s[0:7] dmask:0x9 unorm a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d_V2:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:1], v0, s[0:7] dmask:0x9 dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%v = call <2 x float> @llvm.amdgcn.image.load.1d.v2f32.i16(i32 9, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <2 x float> @llvm.amdgcn.image.load.1d.v2f32.i16(i32 9, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
	ret <2 x float> %v			ret <2 x float> %v
	}			}

	; GCN-LABEL: {{^}}store_1d_V1:
	; GCN: image_store v0, v1, s[0:7] dmask:0x2 unorm a16
	define amdgpu_ps void @store_1d_V1(<8 x i32> inreg %rsrc, float %vdata, <2 x i16> %coords) {			define amdgpu_ps void @store_1d_V1(<8 x i32> inreg %rsrc, float %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d_V1:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v0, v1, s[0:7] dmask:0x2 unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_1d_V1:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v0, v1, s[0:7] dmask:0x2 dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	call void @llvm.amdgcn.image.store.1d.f32.i16(float %vdata, i32 2, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.f32.i16(float %vdata, i32 2, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_1d_V2:
	; GCN: image_store v[0:1], v2, s[0:7] dmask:0xc unorm a16
	define amdgpu_ps void @store_1d_V2(<8 x i32> inreg %rsrc, <2 x float> %vdata, <2 x i16> %coords) {			define amdgpu_ps void @store_1d_V2(<8 x i32> inreg %rsrc, <2 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d_V2:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:1], v2, s[0:7] dmask:0xc unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_1d_V2:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:1], v2, s[0:7] dmask:0xc dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	call void @llvm.amdgcn.image.store.1d.v2f32.i16(<2 x float> %vdata, i32 12, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.v2f32.i16(<2 x float> %vdata, i32 12, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_1d_glc:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm glc a16{{$}}
	define amdgpu_ps <4 x float> @load_1d_glc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load_1d_glc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d_glc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm glc a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d_glc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm glc a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 1)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 1)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_1d_slc:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm slc a16{{$}}
	define amdgpu_ps <4 x float> @load_1d_slc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load_1d_slc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d_slc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm slc a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d_slc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm slc a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 2)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 2)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load_1d_glc_slc:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm glc slc a16{{$}}
	define amdgpu_ps <4 x float> @load_1d_glc_slc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load_1d_glc_slc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d_glc_slc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm glc slc a16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d_glc_slc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm glc slc a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 3)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 3)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}store_1d_glc:
	; GCN: image_store v[0:3], v4, s[0:7] dmask:0xf unorm glc a16{{$}}
	define amdgpu_ps void @store_1d_glc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {			define amdgpu_ps void @store_1d_glc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d_glc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm glc a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_1d_glc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm glc a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 1)			call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 1)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_1d_slc:
	; GCN: image_store v[0:3], v4, s[0:7] dmask:0xf unorm slc a16{{$}}
	define amdgpu_ps void @store_1d_slc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {			define amdgpu_ps void @store_1d_slc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d_slc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm slc a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_1d_slc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm slc a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 2)			call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 2)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_1d_glc_slc:
	; GCN: image_store v[0:3], v4, s[0:7] dmask:0xf unorm glc slc a16{{$}}
	define amdgpu_ps void @store_1d_glc_slc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {			define amdgpu_ps void @store_1d_glc_slc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d_glc_slc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm glc slc a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_1d_glc_slc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm glc slc a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%s = extractelement <2 x i16> %coords, i32 0			%s = extractelement <2 x i16> %coords, i32 0
	call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 3)			call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 3)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}getresinfo_dmask0:
	; GCN-NOT: image
	; GCN: ; return to shader part epilog
	define amdgpu_ps <4 x float> @getresinfo_dmask0(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) #0 {			define amdgpu_ps <4 x float> @getresinfo_dmask0(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) #0 {
				; GFX9-LABEL: getresinfo_dmask0:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_dmask0:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%mip = extractelement <2 x i16> %coords, i32 0			%mip = extractelement <2 x i16> %coords, i32 0
	%r = call <4 x float> @llvm.amdgcn.image.getresinfo.1d.v4f32.i16(i32 0, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%r = call <4 x float> @llvm.amdgcn.image.getresinfo.1d.v4f32.i16(i32 0, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	declare <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32, i16, i16, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32, i16, i16, <8 x i32>, i32, i32) #1
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx900 -show-mc-encoding < %s \| FileCheck -check-prefixes=GFX9 %s
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -show-mc-encoding < %s \| FileCheck -check-prefixes=GFX10 %s

				define amdgpu_ps <4 x float> @load_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x00,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16 ; encoding: [0x00,0x1f,0x00,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_2d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x00,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16 ; encoding: [0x08,0x1f,0x00,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%t = extractelement <2 x i16> %coords, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 %s, i16 %t, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_3d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x00,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16 ; encoding: [0x10,0x1f,0x00,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%r = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %r, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_cube:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x00,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_CUBE unorm a16 ; encoding: [0x18,0x1f,0x00,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x00,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY unorm a16 ; encoding: [0x20,0x1f,0x00,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%slice = extractelement <2 x i16> %coords, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i16(i32 15, i16 %s, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_2darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x00,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY unorm a16 ; encoding: [0x28,0x1f,0x00,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_2dmsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x00,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_2dmsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA unorm a16 ; encoding: [0x30,0x1f,0x00,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%fragid = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.2dmsaa.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_2darraymsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x00,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_2darraymsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm a16 ; encoding: [0x38,0x1f,0x00,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%fragid = extractelement <2 x i16> %coords_hi, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.2darraymsaa.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_mip_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v0, s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x04,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16 ; encoding: [0x00,0x1f,0x04,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%mip = extractelement <2 x i16> %coords, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i16(i32 15, i16 %s, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_mip_2d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x04,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16 ; encoding: [0x08,0x1f,0x04,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%mip = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_mip_3d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x04,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16 ; encoding: [0x10,0x1f,0x04,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%r = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %r, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_mip_cube:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x04,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_CUBE unorm a16 ; encoding: [0x18,0x1f,0x04,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.cube.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_mip_1darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x04,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_1darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY unorm a16 ; encoding: [0x20,0x1f,0x04,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%slice = extractelement <2 x i16> %coords_lo, i32 1
				%mip = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.1darray.v4f32.i16(i32 15, i16 %s, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: load_mip_2darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x04,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_mip_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY unorm a16 ; encoding: [0x28,0x1f,0x04,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.2darray.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps void @store_1d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x20,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16 ; encoding: [0x00,0x1f,0x20,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_2d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_2d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x20,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16 ; encoding: [0x08,0x1f,0x20,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%t = extractelement <2 x i16> %coords, i32 1
				call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_3d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_3d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x20,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16 ; encoding: [0x10,0x1f,0x20,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%r = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %r, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_cube(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_cube:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x20,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_CUBE unorm a16 ; encoding: [0x18,0x1f,0x20,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.cube.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_1darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x20,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_1darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY unorm a16 ; encoding: [0x20,0x1f,0x20,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%slice = extractelement <2 x i16> %coords, i32 1
				call void @llvm.amdgcn.image.store.1darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_2darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_2darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x20,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY unorm a16 ; encoding: [0x28,0x1f,0x20,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.2darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_2dmsaa(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_2dmsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x20,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_2dmsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA unorm a16 ; encoding: [0x30,0x1f,0x20,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%fragid = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.2dmsaa.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_2darraymsaa(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_2darraymsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x20,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_2darraymsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm a16 ; encoding: [0x38,0x1f,0x20,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%fragid = extractelement <2 x i16> %coords_hi, i32 1
				call void @llvm.amdgcn.image.store.2darraymsaa.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_1d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_mip_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v4, s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x24,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_mip_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16 ; encoding: [0x00,0x1f,0x24,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%mip = extractelement <2 x i16> %coords, i32 1
				call void @llvm.amdgcn.image.store.mip.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_2d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_mip_2d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x24,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_mip_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16 ; encoding: [0x08,0x1f,0x24,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%mip = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.mip.2d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_3d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_mip_3d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x24,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_mip_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16 ; encoding: [0x10,0x1f,0x24,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%r = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				call void @llvm.amdgcn.image.store.mip.3d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %r, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_cube(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_mip_cube:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x24,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_mip_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_CUBE unorm a16 ; encoding: [0x18,0x1f,0x24,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				call void @llvm.amdgcn.image.store.mip.cube.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_1darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_mip_1darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x24,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_mip_1darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY unorm a16 ; encoding: [0x20,0x1f,0x24,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%slice = extractelement <2 x i16> %coords_lo, i32 1
				%mip = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.mip.1darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_2darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: store_mip_2darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x24,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_mip_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store_mip v[0:3], v[4:5], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY unorm a16 ; encoding: [0x28,0x1f,0x24,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				call void @llvm.amdgcn.image.store.mip.2darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps <4 x float> @getresinfo_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x38,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16 ; encoding: [0x00,0x1f,0x38,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.1d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_2d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x38,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16 ; encoding: [0x08,0x1f,0x38,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_3d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x38,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16 ; encoding: [0x10,0x1f,0x38,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.3d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_cube:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x38,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_CUBE unorm a16 ; encoding: [0x18,0x1f,0x38,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.cube.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_1darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x38,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_1darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY unorm a16 ; encoding: [0x20,0x1f,0x38,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.1darray.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_2darray:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x38,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY unorm a16 ; encoding: [0x28,0x1f,0x38,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2darray.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_2dmsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_2dmsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 ; encoding: [0x00,0x9f,0x38,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_2dmsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA unorm a16 ; encoding: [0x30,0x1f,0x38,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2dmsaa.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_2darraymsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: getresinfo_2darraymsaa:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf unorm a16 da ; encoding: [0x00,0xdf,0x38,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_2darraymsaa:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_get_resinfo v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm a16 ; encoding: [0x38,0x1f,0x38,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2darraymsaa.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps float @load_1d_V1(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d_V1:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v0, v0, s[0:7] dmask:0x8 unorm a16 ; encoding: [0x00,0x98,0x00,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d_V1:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v0, v0, s[0:7] dmask:0x8 dim:SQ_RSRC_IMG_1D unorm a16 ; encoding: [0x00,0x18,0x00,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call float @llvm.amdgcn.image.load.1d.f32.i16(i32 8, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret float %v
				}

				define amdgpu_ps <2 x float> @load_1d_V2(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d_V2:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:1], v0, s[0:7] dmask:0x9 unorm a16 ; encoding: [0x00,0x99,0x00,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d_V2:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:1], v0, s[0:7] dmask:0x9 dim:SQ_RSRC_IMG_1D unorm a16 ; encoding: [0x00,0x19,0x00,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call <2 x float> @llvm.amdgcn.image.load.1d.v2f32.i16(i32 9, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret <2 x float> %v
				}

				define amdgpu_ps void @store_1d_V1(<8 x i32> inreg %rsrc, float %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d_V1:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v0, v1, s[0:7] dmask:0x2 unorm a16 ; encoding: [0x00,0x92,0x20,0xf0,0x01,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_1d_V1:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v0, v1, s[0:7] dmask:0x2 dim:SQ_RSRC_IMG_1D unorm a16 ; encoding: [0x00,0x12,0x20,0xf0,0x01,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.f32.i16(float %vdata, i32 2, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_1d_V2(<8 x i32> inreg %rsrc, <2 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d_V2:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:1], v2, s[0:7] dmask:0xc unorm a16 ; encoding: [0x00,0x9c,0x20,0xf0,0x02,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_1d_V2:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:1], v2, s[0:7] dmask:0xc dim:SQ_RSRC_IMG_1D unorm a16 ; encoding: [0x00,0x1c,0x20,0xf0,0x02,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.v2f32.i16(<2 x float> %vdata, i32 12, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps <4 x float> @load_1d_glc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d_glc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm glc a16 ; encoding: [0x00,0xbf,0x00,0xf0,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d_glc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm glc a16 ; encoding: [0x00,0x3f,0x00,0xf0,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 1)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_1d_slc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d_slc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm slc a16 ; encoding: [0x00,0x9f,0x00,0xf2,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d_slc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm slc a16 ; encoding: [0x00,0x1f,0x00,0xf2,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 2)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_1d_glc_slc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: load_1d_glc_slc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm glc slc a16 ; encoding: [0x00,0xbf,0x00,0xf2,0x00,0x00,0x00,0x00]
				; GFX9-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: load_1d_glc_slc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm glc slc a16 ; encoding: [0x00,0x3f,0x00,0xf2,0x00,0x00,0x00,0x40]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 3)
				ret <4 x float> %v
				}

				define amdgpu_ps void @store_1d_glc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d_glc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm glc a16 ; encoding: [0x00,0xbf,0x20,0xf0,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_1d_glc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm glc a16 ; encoding: [0x00,0x3f,0x20,0xf0,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 1)
				ret void
				}

				define amdgpu_ps void @store_1d_slc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d_slc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm slc a16 ; encoding: [0x00,0x9f,0x20,0xf2,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_1d_slc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm slc a16 ; encoding: [0x00,0x1f,0x20,0xf2,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 2)
				ret void
				}

				define amdgpu_ps void @store_1d_glc_slc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: store_1d_glc_slc:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf unorm glc slc a16 ; encoding: [0x00,0xbf,0x20,0xf2,0x04,0x00,0x00,0x00]
				; GFX9-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; GFX10-LABEL: store_1d_glc_slc:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[0:3], v4, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm glc slc a16 ; encoding: [0x00,0x3f,0x20,0xf2,0x04,0x00,0x00,0x40]
				; GFX10-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 3)
				ret void
				}

				define amdgpu_ps <4 x float> @getresinfo_dmask0(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) #0 {
				; GFX9-LABEL: getresinfo_dmask0:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: getresinfo_dmask0:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: ; return to shader part epilog
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%r = call <4 x float> @llvm.amdgcn.image.getresinfo.1d.v4f32.i16(i32 0, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %r
				}

				declare <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32, i16, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32, i16, i16, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i16(i32, i16, i16, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i16(i32, i16, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i16(i32, i16, i16, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.2dmsaa.v4f32.i16(i32, i16, i16, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.2darraymsaa.v4f32.i16(i32, i16, i16, i16, i16, <8 x i32>, i32, i32) #1

				declare <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i16(i32, i16, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i16(i32, i16, i16, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i16(i32, i16, i16, i16, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.cube.v4f32.i16(i32, i16, i16, i16, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.1darray.v4f32.i16(i32, i16, i16, i16, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.2darray.v4f32.i16(i32, i16, i16, i16, i16, <8 x i32>, i32, i32) #1

				declare void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float>, i32, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float>, i32, i16, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float>, i32, i16, i16, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.cube.v4f32.i16(<4 x float>, i32, i16, i16, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.1darray.v4f32.i16(<4 x float>, i32, i16, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.2darray.v4f32.i16(<4 x float>, i32, i16, i16, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.2dmsaa.v4f32.i16(<4 x float>, i32, i16, i16, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.2darraymsaa.v4f32.i16(<4 x float>, i32, i16, i16, i16, i16, <8 x i32>, i32, i32) #0

				declare void @llvm.amdgcn.image.store.mip.1d.v4f32.i16(<4 x float>, i32, i16, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.mip.2d.v4f32.i16(<4 x float>, i32, i16, i16, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.mip.3d.v4f32.i16(<4 x float>, i32, i16, i16, i16, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.mip.cube.v4f32.i16(<4 x float>, i32, i16, i16, i16, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.mip.1darray.v4f32.i16(<4 x float>, i32, i16, i16, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.mip.2darray.v4f32.i16(<4 x float>, i32, i16, i16, i16, i16, <8 x i32>, i32, i32) #0

				declare <4 x float> @llvm.amdgcn.image.getresinfo.1d.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #2
				declare <4 x float> @llvm.amdgcn.image.getresinfo.2d.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #2
				declare <4 x float> @llvm.amdgcn.image.getresinfo.3d.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #2
				declare <4 x float> @llvm.amdgcn.image.getresinfo.cube.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #2
				declare <4 x float> @llvm.amdgcn.image.getresinfo.1darray.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #2
				declare <4 x float> @llvm.amdgcn.image.getresinfo.2darray.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #2
				declare <4 x float> @llvm.amdgcn.image.getresinfo.2dmsaa.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #2
				declare <4 x float> @llvm.amdgcn.image.getresinfo.2darraymsaa.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #2

				declare float @llvm.amdgcn.image.load.1d.f32.i16(i32, i16, <8 x i32>, i32, i32) #1
				declare float @llvm.amdgcn.image.load.2d.f32.i16(i32, i16, i16, <8 x i32>, i32, i32) #1
				declare <2 x float> @llvm.amdgcn.image.load.1d.v2f32.i16(i32, i16, <8 x i32>, i32, i32) #1
				declare void @llvm.amdgcn.image.store.1d.f32.i16(float, i32, i16, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.1d.v2f32.i16(<2 x float>, i32, i16, <8 x i32>, i32, i32) #0

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readonly }
				attributes #2 = { nounwind readnone }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s

	define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
	; GCN-LABEL: gather4_2d:			; GFX9-LABEL: gather4_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32 1, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32 1, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {			define amdgpu_ps <4 x float> @gather4_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {
	; GCN-LABEL: gather4_cube:			; GFX9-LABEL: gather4_cube:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16 da			; GFX9-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16 da
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_CUBE a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32 1, half %s, half %t, half %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32 1, half %s, half %t, half %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {			define amdgpu_ps <4 x float> @gather4_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {
	; GCN-LABEL: gather4_2darray:			; GFX9-LABEL: gather4_2darray:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16 da			; GFX9-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16 da
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D_ARRAY a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32 1, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32 1, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {
	; GCN-LABEL: gather4_c_2d:			; GFX9-LABEL: gather4_c_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_gather4_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_c_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_gather4_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {
	; GCN-LABEL: gather4_cl_2d:			; GFX9-LABEL: gather4_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_gather4_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_gather4_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f16(i32 1, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f16(i32 1, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {
	; GCN-LABEL: gather4_c_cl_2d:			; GFX9-LABEL: gather4_c_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GCN-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_gather4_c_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_c_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_gather4_c_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {
	; GCN-LABEL: gather4_b_2d:			; GFX9-LABEL: gather4_b_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_b_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {
	; GCN-LABEL: gather4_c_b_2d:			; GFX9-LABEL: gather4_c_b_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GCN-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_c_b_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {
	; GCN-LABEL: gather4_b_cl_2d:			; GFX9-LABEL: gather4_b_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GCN-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_gather4_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_b_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_gather4_b_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {
	; GCN-LABEL: gather4_c_b_cl_2d:			; GFX9-LABEL: gather4_c_b_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_mov_b32_e32 v7, v4			; GFX9-NEXT: v_mov_b32_e32 v7, v4
	; GCN-NEXT: v_mov_b32_e32 v4, v0			; GFX9-NEXT: v_mov_b32_e32 v4, v0
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v2
	; GCN-NEXT: v_mov_b32_e32 v5, v1			; GFX9-NEXT: v_mov_b32_e32 v5, v1
	; GCN-NEXT: v_lshl_or_b32 v6, v3, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v6, v3, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_gather4_c_b_cl v[0:3], v[4:7], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[4:7], s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_c_b_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v0, v1, v2, v4], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {			define amdgpu_ps <4 x float> @gather4_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {
	; GCN-LABEL: gather4_l_2d:			; GFX9-LABEL: gather4_l_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GCN-NEXT: image_gather4_l v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_l v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_l_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
				; GFX10-NEXT: image_gather4_l v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.l.2d.v4f32.f16(i32 1, half %s, half %t, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.l.2d.v4f32.f16(i32 1, half %s, half %t, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %lod) {			define amdgpu_ps <4 x float> @gather4_c_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %lod) {
	; GCN-LABEL: gather4_c_l_2d:			; GFX9-LABEL: gather4_c_l_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GCN-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GCN-NEXT: image_gather4_c_l v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_l v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_c_l_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: image_gather4_c_l v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.l.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.l.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_lz_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_lz_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
	; GCN-LABEL: gather4_lz_2d:			; GFX9-LABEL: gather4_lz_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GCN-NEXT: image_gather4_lz v[0:3], v0, s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_lz v[0:3], v0, s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_lz_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
				; GFX10-NEXT: image_gather4_lz v[0:3], v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f16(i32 1, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f16(i32 1, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_lz_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_c_lz_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {
	; GCN-LABEL: gather4_c_lz_2d:			; GFX9-LABEL: gather4_c_lz_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GCN-NEXT: image_gather4_c_lz v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_lz v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: gather4_c_lz_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: image_gather4_c_lz v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	declare <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	Show All 19 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.load.a16.d16.ll

	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX9 %s
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX10 %s

	; GCN-LABEL: {{^}}load.f16.1d:			; GCN-LABEL: {{^}}load.f16.1d:
	; GCN: image_load v0, v0, s[0:7] dmask:0x1 unorm a16 d16			; GFX9: image_load v0, v0, s[0:7] dmask:0x1 unorm a16 d16
				; GFX10: image_load v0, v0, s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm a16 d16
	define amdgpu_ps <4 x half> @load.f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f16.1d:			; GCN-LABEL: {{^}}load.v2f16.1d:
	; GCN: image_load v0, v0, s[0:7] dmask:0x3 unorm a16 d16			; GFX9: image_load v0, v0, s[0:7] dmask:0x3 unorm a16 d16
				; GFX10: image_load v0, v0, s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_1D unorm a16 d16
	define amdgpu_ps <4 x half> @load.v2f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.v2f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v3f16.1d:			; GCN-LABEL: {{^}}load.v3f16.1d:
	; GCN: image_load v[0:1], v0, s[0:7] dmask:0x7 unorm a16 d16			; GFX9: image_load v[0:1], v0, s[0:7] dmask:0x7 unorm a16 d16
				; GFX10: image_load v[0:1], v0, s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_1D unorm a16 d16
	define amdgpu_ps <4 x half> @load.v3f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.v3f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 7, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 7, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v4f16.1d:			; GCN-LABEL: {{^}}load.v4f16.1d:
	; GCN: image_load v[0:1], v0, s[0:7] dmask:0xf unorm a16 d16			; GFX9: image_load v[0:1], v0, s[0:7] dmask:0xf unorm a16 d16
				; GFX10: image_load v[0:1], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16 d16
	define amdgpu_ps <4 x half> @load.v4f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.v4f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.f16.2d:			; GCN-LABEL: {{^}}load.f16.2d:
	; GCN: image_load v0, v0, s[0:7] dmask:0x1 unorm a16 d16			; GFX9: image_load v0, v0, s[0:7] dmask:0x1 unorm a16 d16
				; GFX10: image_load v0, v0, s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_2D unorm a16 d16
	define amdgpu_ps <4 x half> @load.f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f16.2d:			; GCN-LABEL: {{^}}load.v2f16.2d:
	; GCN: image_load v0, v0, s[0:7] dmask:0x3 unorm a16 d16			; GFX9: image_load v0, v0, s[0:7] dmask:0x3 unorm a16 d16
				; GFX10: image_load v0, v0, s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_2D unorm a16 d16
	define amdgpu_ps <4 x half> @load.v2f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.v2f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v3f16.2d:			; GCN-LABEL: {{^}}load.v3f16.2d:
	; GCN: image_load v[0:1], v0, s[0:7] dmask:0x7 unorm a16 d16			; GFX9: image_load v[0:1], v0, s[0:7] dmask:0x7 unorm a16 d16
				; GFX10: image_load v[0:1], v0, s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_2D unorm a16 d16
	define amdgpu_ps <4 x half> @load.v3f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.v3f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 7, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 7, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v4f16.2d:			; GCN-LABEL: {{^}}load.v4f16.2d:
	; GCN: image_load v[0:1], v0, s[0:7] dmask:0xf unorm a16 d16			; GFX9: image_load v[0:1], v0, s[0:7] dmask:0xf unorm a16 d16
				; GFX10: image_load v[0:1], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16 d16
	define amdgpu_ps <4 x half> @load.v4f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.v4f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.f16.3d:			; GCN-LABEL: {{^}}load.f16.3d:
	; GCN: image_load v0, v[0:1], s[0:7] dmask:0x1 unorm a16 d16			; GFX9: image_load v0, v[0:1], s[0:7] dmask:0x1 unorm a16 d16
				; GFX10: image_load v0, v[0:1], s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_3D unorm a16 d16
	define amdgpu_ps <4 x half> @load.f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x half> @load.f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f16.3d:			; GCN-LABEL: {{^}}load.v2f16.3d:
	; GCN: image_load v0, v[0:1], s[0:7] dmask:0x3 unorm a16 d16			; GFX9: image_load v0, v[0:1], s[0:7] dmask:0x3 unorm a16 d16
				; GFX10: image_load v0, v[0:1], s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_3D unorm a16 d16
	define amdgpu_ps <4 x half> @load.v2f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x half> @load.v2f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v3f16.3d:			; GCN-LABEL: {{^}}load.v3f16.3d:
	; GCN: image_load v[0:1], v[0:1], s[0:7] dmask:0x7 unorm a16 d16			; GFX9: image_load v[0:1], v[0:1], s[0:7] dmask:0x7 unorm a16 d16
				; GFX10: image_load v[0:1], v[0:1], s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_3D unorm a16 d16
	define amdgpu_ps <4 x half> @load.v3f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x half> @load.v3f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 7, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 7, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v4f16.3d:			; GCN-LABEL: {{^}}load.v4f16.3d:
	; GCN: image_load v[0:1], v[0:1], s[0:7] dmask:0xf unorm a16 d16			; GFX9: image_load v[0:1], v[0:1], s[0:7] dmask:0xf unorm a16 d16
				; GFX10: image_load v[0:1], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16 d16
	define amdgpu_ps <4 x half> @load.v4f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x half> @load.v4f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 15, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 15, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	declare <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32, i16, <8 x i32>, i32, i32) #2			declare <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32, i16, <8 x i32>, i32, i32) #2
	declare <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32, i16, i16, <8 x i32>, i32, i32) #2			declare <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32, i16, i16, <8 x i32>, i32, i32) #2
	declare <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32, i16, i16, i16, <8 x i32>, i32, i32) #2			declare <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32, i16, i16, i16, <8 x i32>, i32, i32) #2

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.load.a16.ll

	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX9 %s
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX10 %s

	; GCN-LABEL: {{^}}load.f32.1d:			; GCN-LABEL: {{^}}load.f32.1d:
	; GCN: image_load v0, v0, s[0:7] dmask:0x1 unorm a16			; GFX9: image_load v0, v0, s[0:7] dmask:0x1 unorm a16
				; GFX10: image_load v0, v0, s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm a16
	define amdgpu_ps <4 x float> @load.f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f32.1d:			; GCN-LABEL: {{^}}load.v2f32.1d:
	; GCN: image_load v[0:1], v0, s[0:7] dmask:0x3 unorm a16			; GFX9: image_load v[0:1], v0, s[0:7] dmask:0x3 unorm a16
				; GFX10: image_load v[0:1], v0, s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_1D unorm a16
	define amdgpu_ps <4 x float> @load.v2f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.v2f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v3f32.1d:			; GCN-LABEL: {{^}}load.v3f32.1d:
	; GCN: image_load v[0:2], v0, s[0:7] dmask:0x7 unorm a16			; GFX9: image_load v[0:2], v0, s[0:7] dmask:0x7 unorm a16
				; GFX10: image_load v[0:2], v0, s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_1D unorm a16
	define amdgpu_ps <4 x float> @load.v3f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.v3f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 7, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 7, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v4f32.1d:			; GCN-LABEL: {{^}}load.v4f32.1d:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16			; GFX9: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16
				; GFX10: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16
	define amdgpu_ps <4 x float> @load.v4f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.v4f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.f32.2d:			; GCN-LABEL: {{^}}load.f32.2d:
	; GCN: image_load v0, v0, s[0:7] dmask:0x1 unorm a16			; GFX9: image_load v0, v0, s[0:7] dmask:0x1 unorm a16
				; GFX10: image_load v0, v0, s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_2D unorm a16
	define amdgpu_ps <4 x float> @load.f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f32.2d:			; GCN-LABEL: {{^}}load.v2f32.2d:
	; GCN: image_load v[0:1], v0, s[0:7] dmask:0x3 unorm a16			; GFX9: image_load v[0:1], v0, s[0:7] dmask:0x3 unorm a16
				; GFX10: image_load v[0:1], v0, s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_2D unorm a16
	define amdgpu_ps <4 x float> @load.v2f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.v2f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v3f32.2d:			; GCN-LABEL: {{^}}load.v3f32.2d:
	; GCN: image_load v[0:2], v0, s[0:7] dmask:0x7 unorm a16			; GFX9: image_load v[0:2], v0, s[0:7] dmask:0x7 unorm a16
				; GFX10: image_load v[0:2], v0, s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_2D unorm a16
	define amdgpu_ps <4 x float> @load.v3f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.v3f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 7, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 7, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v4f32.2d:			; GCN-LABEL: {{^}}load.v4f32.2d:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16			; GFX9: image_load v[0:3], v0, s[0:7] dmask:0xf unorm a16
				; GFX10: image_load v[0:3], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16
	define amdgpu_ps <4 x float> @load.v4f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.v4f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.f32.3d:			; GCN-LABEL: {{^}}load.f32.3d:
	; GCN: image_load v0, v[0:1], s[0:7] dmask:0x1 unorm a16			; GFX9: image_load v0, v[0:1], s[0:7] dmask:0x1 unorm a16
				; GFX10: image_load v0, v[0:1], s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_3D unorm a16
	define amdgpu_ps <4 x float> @load.f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load.f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f32.3d:			; GCN-LABEL: {{^}}load.v2f32.3d:
	; GCN: image_load v[0:1], v[0:1], s[0:7] dmask:0x3 unorm a16			; GFX9: image_load v[0:1], v[0:1], s[0:7] dmask:0x3 unorm a16
				; GFX10: image_load v[0:1], v[0:1], s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_3D unorm a16
	define amdgpu_ps <4 x float> @load.v2f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load.v2f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v3f32.3d:			; GCN-LABEL: {{^}}load.v3f32.3d:
	; GCN: image_load v[0:2], v[0:1], s[0:7] dmask:0x7 unorm a16			; GFX9: image_load v[0:2], v[0:1], s[0:7] dmask:0x7 unorm a16
				; GFX10: image_load v[0:2], v[0:1], s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_3D unorm a16
	define amdgpu_ps <4 x float> @load.v3f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load.v3f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 7, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 7, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v4f32.3d:			; GCN-LABEL: {{^}}load.v4f32.3d:
	; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16			; GFX9: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm a16
				; GFX10: image_load v[0:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16
	define amdgpu_ps <4 x float> @load.v4f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load.v4f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 15, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 15, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	declare <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #2			declare <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32, i16, <8 x i32>, i32, i32) #2
	declare <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32, i16, i16, <8 x i32>, i32, i32) #2			declare <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32, i16, i16, <8 x i32>, i32, i32) #2
	declare <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32, i16, i16, i16, <8 x i32>, i32, i32) #2			declare <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32, i16, i16, i16, <8 x i32>, i32, i32) #2

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s

	define amdgpu_ps <4 x float> @sample_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {			define amdgpu_ps <4 x float> @sample_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {
	; GCN-LABEL: sample_1d:			; GFX9-LABEL: sample_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
	; GCN-LABEL: sample_2d:			; GFX9-LABEL: sample_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32 15, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32 15, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_3d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %r) {			define amdgpu_ps <4 x float> @sample_3d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %r) {
	; GCN-LABEL: sample_3d:			; GFX9-LABEL: sample_3d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_3D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f16(i32 15, half %s, half %t, half %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f16(i32 15, half %s, half %t, half %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {			define amdgpu_ps <4 x float> @sample_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {
	; GCN-LABEL: sample_cube:			; GFX9-LABEL: sample_cube:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16 da			; GFX9-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16 da
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_cube:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_CUBE a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f16(i32 15, half %s, half %t, half %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f16(i32 15, half %s, half %t, half %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_1darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %slice) {			define amdgpu_ps <4 x float> @sample_1darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %slice) {
	; GCN-LABEL: sample_1darray:			; GFX9-LABEL: sample_1darray:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16 da			; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16 da
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_1darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f16(i32 15, half %s, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f16(i32 15, half %s, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {			define amdgpu_ps <4 x float> @sample_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {
	; GCN-LABEL: sample_2darray:			; GFX9-LABEL: sample_2darray:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16 da			; GFX9-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16 da
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_2darray:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f16(i32 15, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f16(i32 15, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s) {			define amdgpu_ps <4 x float> @sample_c_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s) {
	; GCN-LABEL: sample_c_1d:			; GFX9-LABEL: sample_c_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f16(i32 15, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f16(i32 15, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {
	; GCN-LABEL: sample_c_2d:			; GFX9-LABEL: sample_c_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %clamp) {
	; GCN-LABEL: sample_cl_1d:			; GFX9-LABEL: sample_cl_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_cl v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_cl v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_cl_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_cl v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f16(i32 15, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f16(i32 15, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {
	; GCN-LABEL: sample_cl_2d:			; GFX9-LABEL: sample_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f16(i32 15, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f16(i32 15, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %clamp) {
	; GCN-LABEL: sample_c_cl_1d:			; GFX9-LABEL: sample_c_cl_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_c_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_cl_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_c_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f16(i32 15, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f16(i32 15, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {
	; GCN-LABEL: sample_c_cl_2d:			; GFX9-LABEL: sample_c_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GCN-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_c_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_c_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s) {			define amdgpu_ps <4 x float> @sample_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s) {
	; GCN-LABEL: sample_b_1d:			; GFX9-LABEL: sample_b_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_b_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f16(i32 15, float %bias, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f16(i32 15, float %bias, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {
	; GCN-LABEL: sample_b_2d:			; GFX9-LABEL: sample_b_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_b_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s) {			define amdgpu_ps <4 x float> @sample_c_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s) {
	; GCN-LABEL: sample_c_b_1d:			; GFX9-LABEL: sample_c_b_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_b_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {
	; GCN-LABEL: sample_c_b_2d:			; GFX9-LABEL: sample_c_b_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GCN-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_b_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %clamp) {
	; GCN-LABEL: sample_b_cl_1d:			; GFX9-LABEL: sample_b_cl_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_b_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_b_cl_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_b_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {
	; GCN-LABEL: sample_b_cl_2d:			; GFX9-LABEL: sample_b_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GCN-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_b_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_b_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %clamp) {
	; GCN-LABEL: sample_c_b_cl_1d:			; GFX9-LABEL: sample_c_b_cl_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GCN-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_c_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_b_cl_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_c_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {
	; GCN-LABEL: sample_c_b_cl_2d:			; GFX9-LABEL: sample_c_b_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GCN-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GCN-NEXT: v_mov_b32_e32 v7, v4			; GFX9-NEXT: v_mov_b32_e32 v7, v4
	; GCN-NEXT: v_mov_b32_e32 v4, v0			; GFX9-NEXT: v_mov_b32_e32 v4, v0
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v2
	; GCN-NEXT: v_mov_b32_e32 v5, v1			; GFX9-NEXT: v_mov_b32_e32 v5, v1
	; GCN-NEXT: v_lshl_or_b32 v6, v3, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v6, v3, 16, v0
	; GCN-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GCN-NEXT: image_sample_c_b_cl v[0:3], v[4:7], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_b_cl v[0:3], v[4:7], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_b_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s12, exec_lo
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
				; GFX10-NEXT: image_sample_c_b_cl v[0:3], [v0, v1, v2, v4], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {			define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {
	; GCN-LABEL: sample_d_1d:			; GFX9-LABEL: sample_d_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: image_sample_d v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_d v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_d_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_sample_d v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f16.f16(i32 15, half %dsdh, half %dsdv, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f16.f16(i32 15, half %dsdh, half %dsdv, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_d_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_d_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {
	; GCN-LABEL: sample_d_2d:			; GFX9-LABEL: sample_d_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v6, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v6, 0xffff
	; GCN-NEXT: v_and_b32_e32 v4, v6, v4			; GFX9-NEXT: v_and_b32_e32 v4, v6, v4
	; GCN-NEXT: v_and_b32_e32 v2, v6, v2			; GFX9-NEXT: v_and_b32_e32 v2, v6, v2
	; GCN-NEXT: v_and_b32_e32 v0, v6, v0			; GFX9-NEXT: v_and_b32_e32 v0, v6, v0
	; GCN-NEXT: v_lshl_or_b32 v3, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v3, v3, 16, v2
	; GCN-NEXT: v_lshl_or_b32 v4, v5, 16, v4			; GFX9-NEXT: v_lshl_or_b32 v4, v5, 16, v4
	; GCN-NEXT: v_lshl_or_b32 v2, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v2, v1, 16, v0
	; GCN-NEXT: image_sample_d v[0:3], v[2:4], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_d v[0:3], v[2:4], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_d_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_mov_b32_e32 v7, 0xffff
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v4, v7, v4
				; GFX10-NEXT: v_and_b32_e32 v2, v7, v2
				; GFX10-NEXT: v_and_b32_e32 v0, v7, v0
				; GFX10-NEXT: v_lshl_or_b32 v4, v5, 16, v4
				; GFX10-NEXT: v_lshl_or_b32 v3, v3, 16, v2
				; GFX10-NEXT: v_lshl_or_b32 v2, v1, 16, v0
				; GFX10-NEXT: image_sample_d v[0:3], v[2:4], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f16(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f16(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_d_3d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %drdh, half %dsdv, half %dtdv, half %drdv, half %s, half %t, half %r) {			define amdgpu_ps <4 x float> @sample_d_3d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %drdh, half %dsdv, half %dtdv, half %drdv, half %s, half %t, half %r) {
	; GCN-LABEL: sample_d_3d:			; GFX9-LABEL: sample_d_3d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v12, v8			; GFX9-NEXT: v_mov_b32_e32 v12, v8
	; GCN-NEXT: v_mov_b32_e32 v8, v2			; GFX9-NEXT: v_mov_b32_e32 v8, v2
	; GCN-NEXT: v_mov_b32_e32 v2, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v2, 0xffff
	; GCN-NEXT: v_mov_b32_e32 v10, v5			; GFX9-NEXT: v_mov_b32_e32 v10, v5
	; GCN-NEXT: v_and_b32_e32 v5, v2, v6			; GFX9-NEXT: v_and_b32_e32 v5, v2, v6
	; GCN-NEXT: v_and_b32_e32 v3, v2, v3			; GFX9-NEXT: v_and_b32_e32 v3, v2, v3
	; GCN-NEXT: v_and_b32_e32 v0, v2, v0			; GFX9-NEXT: v_and_b32_e32 v0, v2, v0
	; GCN-NEXT: v_lshl_or_b32 v11, v7, 16, v5			; GFX9-NEXT: v_lshl_or_b32 v11, v7, 16, v5
	; GCN-NEXT: v_lshl_or_b32 v9, v4, 16, v3			; GFX9-NEXT: v_lshl_or_b32 v9, v4, 16, v3
	; GCN-NEXT: v_lshl_or_b32 v7, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v7, v1, 16, v0
	; GCN-NEXT: image_sample_d v[0:3], v[7:14], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_d v[0:3], v[7:14], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_d_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_mov_b32_e32 v9, 0xffff
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v6, v9, v6
				; GFX10-NEXT: v_and_b32_e32 v3, v9, v3
				; GFX10-NEXT: v_and_b32_e32 v0, v9, v0
				; GFX10-NEXT: v_lshl_or_b32 v6, v7, 16, v6
				; GFX10-NEXT: v_lshl_or_b32 v3, v4, 16, v3
				; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
				; GFX10-NEXT: image_sample_d v[0:3], [v0, v2, v3, v5, v6, v8], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_3D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f16(i32 15, half %dsdh, half %dtdh, half %drdh, half %dsdv, half %dtdv, half %drdv, half %s, half %t, half %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f16(i32 15, half %dsdh, half %dtdh, half %drdh, half %dsdv, half %dtdv, half %drdv, half %s, half %t, half %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s) {			define amdgpu_ps <4 x float> @sample_c_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s) {
	; GCN-LABEL: sample_c_d_1d:			; GFX9-LABEL: sample_c_d_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: image_sample_c_d v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_d v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_d_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_sample_c_d v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dsdv, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dsdv, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_d_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_c_d_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {
	; GCN-LABEL: sample_c_d_2d:			; GFX9-LABEL: sample_c_d_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v9, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v9, 0xffff
	; GCN-NEXT: v_mov_b32_e32 v8, v2			; GFX9-NEXT: v_mov_b32_e32 v8, v2
	; GCN-NEXT: v_mov_b32_e32 v7, v3			; GFX9-NEXT: v_mov_b32_e32 v7, v3
	; GCN-NEXT: v_and_b32_e32 v2, v9, v5			; GFX9-NEXT: v_and_b32_e32 v2, v9, v5
	; GCN-NEXT: v_and_b32_e32 v1, v9, v1			; GFX9-NEXT: v_and_b32_e32 v1, v9, v1
	; GCN-NEXT: v_lshl_or_b32 v3, v6, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v3, v6, 16, v2
	; GCN-NEXT: v_and_b32_e32 v2, v9, v7			; GFX9-NEXT: v_and_b32_e32 v2, v9, v7
	; GCN-NEXT: v_lshl_or_b32 v2, v4, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v4, 16, v2
	; GCN-NEXT: v_lshl_or_b32 v1, v8, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v8, 16, v1
	; GCN-NEXT: image_sample_c_d v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_d v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_d_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_mov_b32_e32 v10, 0xffff
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v3, v10, v3
				; GFX10-NEXT: v_and_b32_e32 v1, v10, v1
				; GFX10-NEXT: v_and_b32_e32 v5, v10, v5
				; GFX10-NEXT: v_lshl_or_b32 v3, v4, 16, v3
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: v_lshl_or_b32 v6, v6, 16, v5
				; GFX10-NEXT: image_sample_c_d v[0:3], [v0, v1, v3, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_d_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_d_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s, half %clamp) {
	; GCN-LABEL: sample_d_cl_1d:			; GFX9-LABEL: sample_d_cl_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GCN-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GCN-NEXT: image_sample_d_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_d_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_d_cl_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
				; GFX10-NEXT: image_sample_d_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f16.f16(i32 15, half %dsdh, half %dsdv, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f16.f16(i32 15, half %dsdh, half %dsdv, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_d_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_d_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {
	; GCN-LABEL: sample_d_cl_2d:			; GFX9-LABEL: sample_d_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v7, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v7, 0xffff
	; GCN-NEXT: v_and_b32_e32 v4, v7, v4			; GFX9-NEXT: v_and_b32_e32 v4, v7, v4
	; GCN-NEXT: v_and_b32_e32 v2, v7, v2			; GFX9-NEXT: v_and_b32_e32 v2, v7, v2
	; GCN-NEXT: v_and_b32_e32 v0, v7, v0			; GFX9-NEXT: v_and_b32_e32 v0, v7, v0
	; GCN-NEXT: v_lshl_or_b32 v5, v5, 16, v4			; GFX9-NEXT: v_lshl_or_b32 v5, v5, 16, v4
	; GCN-NEXT: v_lshl_or_b32 v4, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v4, v3, 16, v2
	; GCN-NEXT: v_lshl_or_b32 v3, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v3, v1, 16, v0
	; GCN-NEXT: image_sample_d_cl v[0:3], v[3:6], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_d_cl v[0:3], v[3:6], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_d_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_mov_b32_e32 v7, 0xffff
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v0, v7, v0
				; GFX10-NEXT: v_and_b32_e32 v4, v7, v4
				; GFX10-NEXT: v_and_b32_e32 v2, v7, v2
				; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
				; GFX10-NEXT: v_lshl_or_b32 v5, v5, 16, v4
				; GFX10-NEXT: v_lshl_or_b32 v3, v3, 16, v2
				; GFX10-NEXT: image_sample_d_cl v[0:3], [v0, v3, v5, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f16(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f16(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_d_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_d_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp) {
	; GCN-LABEL: sample_c_d_cl_1d:			; GFX9-LABEL: sample_c_d_cl_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3			; GFX9-NEXT: v_and_b32_e32 v3, 0xffff, v3
	; GCN-NEXT: v_lshl_or_b32 v3, v4, 16, v3			; GFX9-NEXT: v_lshl_or_b32 v3, v4, 16, v3
	; GCN-NEXT: image_sample_c_d_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_d_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_d_cl_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v3, 0xffff, v3
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v3, v4, 16, v3
				; GFX10-NEXT: image_sample_c_d_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_d_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_d_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {
	; GCN-LABEL: sample_c_d_cl_2d:			; GFX9-LABEL: sample_c_d_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v11, v7			; GFX9-NEXT: v_mov_b32_e32 v11, v7
	; GCN-NEXT: v_mov_b32_e32 v7, v0			; GFX9-NEXT: v_mov_b32_e32 v7, v0
	; GCN-NEXT: v_mov_b32_e32 v0, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v0, 0xffff
	; GCN-NEXT: v_and_b32_e32 v5, v0, v5			; GFX9-NEXT: v_and_b32_e32 v5, v0, v5
	; GCN-NEXT: v_and_b32_e32 v3, v0, v3			; GFX9-NEXT: v_and_b32_e32 v3, v0, v3
	; GCN-NEXT: v_and_b32_e32 v0, v0, v1			; GFX9-NEXT: v_and_b32_e32 v0, v0, v1
	; GCN-NEXT: v_lshl_or_b32 v10, v6, 16, v5			; GFX9-NEXT: v_lshl_or_b32 v10, v6, 16, v5
	; GCN-NEXT: v_lshl_or_b32 v9, v4, 16, v3			; GFX9-NEXT: v_lshl_or_b32 v9, v4, 16, v3
	; GCN-NEXT: v_lshl_or_b32 v8, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v8, v2, 16, v0
	; GCN-NEXT: image_sample_c_d_cl v[0:3], v[7:14], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_d_cl v[0:3], v[7:14], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_d_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_mov_b32_e32 v8, 0xffff
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v5, v8, v5
				; GFX10-NEXT: v_and_b32_e32 v1, v8, v1
				; GFX10-NEXT: v_and_b32_e32 v3, v8, v3
				; GFX10-NEXT: v_lshl_or_b32 v5, v6, 16, v5
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: v_lshl_or_b32 v6, v4, 16, v3
				; GFX10-NEXT: image_sample_c_d_cl v[0:3], [v0, v1, v6, v5, v7], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cd_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {			define amdgpu_ps <4 x float> @sample_cd_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {
	; GCN-LABEL: sample_cd_1d:			; GFX9-LABEL: sample_cd_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: image_sample_cd v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_cd v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_cd_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_sample_cd v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f16.f16(i32 15, half %dsdh, half %dsdv, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f16.f16(i32 15, half %dsdh, half %dsdv, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cd_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_cd_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {
	; GCN-LABEL: sample_cd_2d:			; GFX9-LABEL: sample_cd_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v6, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v6, 0xffff
	; GCN-NEXT: v_and_b32_e32 v4, v6, v4			; GFX9-NEXT: v_and_b32_e32 v4, v6, v4
	; GCN-NEXT: v_and_b32_e32 v2, v6, v2			; GFX9-NEXT: v_and_b32_e32 v2, v6, v2
	; GCN-NEXT: v_and_b32_e32 v0, v6, v0			; GFX9-NEXT: v_and_b32_e32 v0, v6, v0
	; GCN-NEXT: v_lshl_or_b32 v3, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v3, v3, 16, v2
	; GCN-NEXT: v_lshl_or_b32 v4, v5, 16, v4			; GFX9-NEXT: v_lshl_or_b32 v4, v5, 16, v4
	; GCN-NEXT: v_lshl_or_b32 v2, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v2, v1, 16, v0
	; GCN-NEXT: image_sample_cd v[0:3], v[2:4], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_cd v[0:3], v[2:4], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_cd_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_mov_b32_e32 v7, 0xffff
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v4, v7, v4
				; GFX10-NEXT: v_and_b32_e32 v2, v7, v2
				; GFX10-NEXT: v_and_b32_e32 v0, v7, v0
				; GFX10-NEXT: v_lshl_or_b32 v4, v5, 16, v4
				; GFX10-NEXT: v_lshl_or_b32 v3, v3, 16, v2
				; GFX10-NEXT: v_lshl_or_b32 v2, v1, 16, v0
				; GFX10-NEXT: image_sample_cd v[0:3], v[2:4], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f16.f16(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f16.f16(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_cd_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s) {			define amdgpu_ps <4 x float> @sample_c_cd_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s) {
	; GCN-LABEL: sample_c_cd_1d:			; GFX9-LABEL: sample_c_cd_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: image_sample_c_cd v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_cd v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_cd_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_sample_c_cd v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dsdv, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dsdv, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_cd_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_c_cd_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {
	; GCN-LABEL: sample_c_cd_2d:			; GFX9-LABEL: sample_c_cd_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v9, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v9, 0xffff
	; GCN-NEXT: v_mov_b32_e32 v8, v2			; GFX9-NEXT: v_mov_b32_e32 v8, v2
	; GCN-NEXT: v_mov_b32_e32 v7, v3			; GFX9-NEXT: v_mov_b32_e32 v7, v3
	; GCN-NEXT: v_and_b32_e32 v2, v9, v5			; GFX9-NEXT: v_and_b32_e32 v2, v9, v5
	; GCN-NEXT: v_and_b32_e32 v1, v9, v1			; GFX9-NEXT: v_and_b32_e32 v1, v9, v1
	; GCN-NEXT: v_lshl_or_b32 v3, v6, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v3, v6, 16, v2
	; GCN-NEXT: v_and_b32_e32 v2, v9, v7			; GFX9-NEXT: v_and_b32_e32 v2, v9, v7
	; GCN-NEXT: v_lshl_or_b32 v2, v4, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v4, 16, v2
	; GCN-NEXT: v_lshl_or_b32 v1, v8, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v8, 16, v1
	; GCN-NEXT: image_sample_c_cd v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_cd v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_cd_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_mov_b32_e32 v10, 0xffff
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v3, v10, v3
				; GFX10-NEXT: v_and_b32_e32 v1, v10, v1
				; GFX10-NEXT: v_and_b32_e32 v5, v10, v5
				; GFX10-NEXT: v_lshl_or_b32 v3, v4, 16, v3
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: v_lshl_or_b32 v6, v6, 16, v5
				; GFX10-NEXT: image_sample_c_cd v[0:3], [v0, v1, v3, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cd_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_cd_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s, half %clamp) {
	; GCN-LABEL: sample_cd_cl_1d:			; GFX9-LABEL: sample_cd_cl_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GCN-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GCN-NEXT: image_sample_cd_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_cd_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_cd_cl_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
				; GFX10-NEXT: image_sample_cd_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f16.f16(i32 15, half %dsdh, half %dsdv, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f16.f16(i32 15, half %dsdh, half %dsdv, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cd_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_cd_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {
	; GCN-LABEL: sample_cd_cl_2d:			; GFX9-LABEL: sample_cd_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v7, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v7, 0xffff
	; GCN-NEXT: v_and_b32_e32 v4, v7, v4			; GFX9-NEXT: v_and_b32_e32 v4, v7, v4
	; GCN-NEXT: v_and_b32_e32 v2, v7, v2			; GFX9-NEXT: v_and_b32_e32 v2, v7, v2
	; GCN-NEXT: v_and_b32_e32 v0, v7, v0			; GFX9-NEXT: v_and_b32_e32 v0, v7, v0
	; GCN-NEXT: v_lshl_or_b32 v5, v5, 16, v4			; GFX9-NEXT: v_lshl_or_b32 v5, v5, 16, v4
	; GCN-NEXT: v_lshl_or_b32 v4, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v4, v3, 16, v2
	; GCN-NEXT: v_lshl_or_b32 v3, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v3, v1, 16, v0
	; GCN-NEXT: image_sample_cd_cl v[0:3], v[3:6], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_cd_cl v[0:3], v[3:6], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_cd_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_mov_b32_e32 v7, 0xffff
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v0, v7, v0
				; GFX10-NEXT: v_and_b32_e32 v4, v7, v4
				; GFX10-NEXT: v_and_b32_e32 v2, v7, v2
				; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
				; GFX10-NEXT: v_lshl_or_b32 v5, v5, 16, v4
				; GFX10-NEXT: v_lshl_or_b32 v3, v3, 16, v2
				; GFX10-NEXT: image_sample_cd_cl v[0:3], [v0, v3, v5, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f16.f16(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f16.f16(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_cd_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_cd_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp) {
	; GCN-LABEL: sample_c_cd_cl_1d:			; GFX9-LABEL: sample_c_cd_cl_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v3, 0xffff, v3			; GFX9-NEXT: v_and_b32_e32 v3, 0xffff, v3
	; GCN-NEXT: v_lshl_or_b32 v3, v4, 16, v3			; GFX9-NEXT: v_lshl_or_b32 v3, v4, 16, v3
	; GCN-NEXT: image_sample_c_cd_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_cd_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_cd_cl_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v3, 0xffff, v3
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v3, v4, 16, v3
				; GFX10-NEXT: image_sample_c_cd_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_cd_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_cd_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {
	; GCN-LABEL: sample_c_cd_cl_2d:			; GFX9-LABEL: sample_c_cd_cl_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v11, v7			; GFX9-NEXT: v_mov_b32_e32 v11, v7
	; GCN-NEXT: v_mov_b32_e32 v7, v0			; GFX9-NEXT: v_mov_b32_e32 v7, v0
	; GCN-NEXT: v_mov_b32_e32 v0, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v0, 0xffff
	; GCN-NEXT: v_and_b32_e32 v5, v0, v5			; GFX9-NEXT: v_and_b32_e32 v5, v0, v5
	; GCN-NEXT: v_and_b32_e32 v3, v0, v3			; GFX9-NEXT: v_and_b32_e32 v3, v0, v3
	; GCN-NEXT: v_and_b32_e32 v0, v0, v1			; GFX9-NEXT: v_and_b32_e32 v0, v0, v1
	; GCN-NEXT: v_lshl_or_b32 v10, v6, 16, v5			; GFX9-NEXT: v_lshl_or_b32 v10, v6, 16, v5
	; GCN-NEXT: v_lshl_or_b32 v9, v4, 16, v3			; GFX9-NEXT: v_lshl_or_b32 v9, v4, 16, v3
	; GCN-NEXT: v_lshl_or_b32 v8, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v8, v2, 16, v0
	; GCN-NEXT: image_sample_c_cd_cl v[0:3], v[7:14], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_cd_cl v[0:3], v[7:14], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_cd_cl_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_mov_b32_e32 v8, 0xffff
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v5, v8, v5
				; GFX10-NEXT: v_and_b32_e32 v1, v8, v1
				; GFX10-NEXT: v_and_b32_e32 v3, v8, v3
				; GFX10-NEXT: v_lshl_or_b32 v5, v6, 16, v5
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: v_lshl_or_b32 v6, v4, 16, v3
				; GFX10-NEXT: image_sample_c_cd_cl v[0:3], [v0, v1, v6, v5, v7], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f32.f16(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_l_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %lod) {			define amdgpu_ps <4 x float> @sample_l_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %lod) {
	; GCN-LABEL: sample_l_1d:			; GFX9-LABEL: sample_l_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GCN-NEXT: image_sample_l v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_l v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_l_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
				; GFX10-NEXT: image_sample_l v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.l.1d.v4f32.f16(i32 15, half %s, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.l.1d.v4f32.f16(i32 15, half %s, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {			define amdgpu_ps <4 x float> @sample_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {
	; GCN-LABEL: sample_l_2d:			; GFX9-LABEL: sample_l_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GCN-NEXT: image_sample_l v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_l v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_l_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
				; GFX10-NEXT: image_sample_l v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.l.2d.v4f32.f16(i32 15, half %s, half %t, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.l.2d.v4f32.f16(i32 15, half %s, half %t, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_l_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %lod) {			define amdgpu_ps <4 x float> @sample_c_l_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %lod) {
	; GCN-LABEL: sample_c_l_1d:			; GFX9-LABEL: sample_c_l_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GCN-NEXT: image_sample_c_l v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_l v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_l_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: image_sample_c_l v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.1d.v4f32.f16(i32 15, float %zcompare, half %s, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.1d.v4f32.f16(i32 15, float %zcompare, half %s, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %lod) {			define amdgpu_ps <4 x float> @sample_c_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %lod) {
	; GCN-LABEL: sample_c_l_2d:			; GFX9-LABEL: sample_c_l_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GCN-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GCN-NEXT: image_sample_c_l v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_l v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_l_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: image_sample_c_l v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, half %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_lz_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {			define amdgpu_ps <4 x float> @sample_lz_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {
	; GCN-LABEL: sample_lz_1d:			; GFX9-LABEL: sample_lz_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: image_sample_lz v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_lz v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_lz_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_sample_lz v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f16(i32 15, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f16(i32 15, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_lz_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_lz_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
	; GCN-LABEL: sample_lz_2d:			; GFX9-LABEL: sample_lz_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GCN-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GCN-NEXT: image_sample_lz v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_lz v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_lz_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
				; GFX10-NEXT: image_sample_lz v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f16(i32 15, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f16(i32 15, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_lz_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s) {			define amdgpu_ps <4 x float> @sample_c_lz_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s) {
	; GCN-LABEL: sample_c_lz_1d:			; GFX9-LABEL: sample_c_lz_1d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: image_sample_c_lz v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_lz v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_lz_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: image_sample_c_lz v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f16(i32 15, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f16(i32 15, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_lz_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_c_lz_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {
	; GCN-LABEL: sample_c_lz_2d:			; GFX9-LABEL: sample_c_lz_2d:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GCN-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GCN-NEXT: image_sample_c_lz v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_lz v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_lz_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
				; GFX10-NEXT: image_sample_c_lz v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps float @sample_c_d_o_2darray_V1(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice) {			define amdgpu_ps float @sample_c_d_o_2darray_V1(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice) {
	; GCN-LABEL: sample_c_d_o_2darray_V1:			; GFX9-LABEL: sample_c_d_o_2darray_V1:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v13, v8			; GFX9-NEXT: v_mov_b32_e32 v13, v8
	; GCN-NEXT: v_mov_b32_e32 v8, v0			; GFX9-NEXT: v_mov_b32_e32 v8, v0
	; GCN-NEXT: v_mov_b32_e32 v0, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v0, 0xffff
	; GCN-NEXT: v_mov_b32_e32 v9, v1			; GFX9-NEXT: v_mov_b32_e32 v9, v1
	; GCN-NEXT: v_and_b32_e32 v1, v0, v6			; GFX9-NEXT: v_and_b32_e32 v1, v0, v6
	; GCN-NEXT: v_lshl_or_b32 v12, v7, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v12, v7, 16, v1
	; GCN-NEXT: v_and_b32_e32 v1, v0, v4			; GFX9-NEXT: v_and_b32_e32 v1, v0, v4
	; GCN-NEXT: v_and_b32_e32 v0, v0, v2			; GFX9-NEXT: v_and_b32_e32 v0, v0, v2
	; GCN-NEXT: v_lshl_or_b32 v11, v5, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v11, v5, 16, v1
	; GCN-NEXT: v_lshl_or_b32 v10, v3, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v10, v3, 16, v0
	; GCN-NEXT: image_sample_c_d_o v0, v[8:15], s[0:7], s[8:11] dmask:0x4 a16 da			; GFX9-NEXT: image_sample_c_d_o v0, v[8:15], s[0:7], s[8:11] dmask:0x4 a16 da
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_d_o_2darray_V1:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_mov_b32_e32 v9, 0xffff
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v4, v9, v4
				; GFX10-NEXT: v_and_b32_e32 v2, v9, v2
				; GFX10-NEXT: v_and_b32_e32 v6, v9, v6
				; GFX10-NEXT: v_lshl_or_b32 v4, v5, 16, v4
				; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
				; GFX10-NEXT: v_lshl_or_b32 v7, v7, 16, v6
				; GFX10-NEXT: image_sample_c_d_o v0, [v0, v1, v2, v4, v7, v8], s[0:7], s[8:11] dmask:0x4 dim:SQ_RSRC_IMG_2D_ARRAY a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f16.f16(i32 4, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f16.f16(i32 4, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret float %v			ret float %v
	}			}

	define amdgpu_ps <2 x float> @sample_c_d_o_2darray_V2(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice) {			define amdgpu_ps <2 x float> @sample_c_d_o_2darray_V2(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice) {
	; GCN-LABEL: sample_c_d_o_2darray_V2:			; GFX9-LABEL: sample_c_d_o_2darray_V2:
	; GCN: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GCN-NEXT: v_mov_b32_e32 v13, v8			; GFX9-NEXT: v_mov_b32_e32 v13, v8
	; GCN-NEXT: v_mov_b32_e32 v8, v0			; GFX9-NEXT: v_mov_b32_e32 v8, v0
	; GCN-NEXT: v_mov_b32_e32 v0, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v0, 0xffff
	; GCN-NEXT: v_mov_b32_e32 v9, v1			; GFX9-NEXT: v_mov_b32_e32 v9, v1
	; GCN-NEXT: v_and_b32_e32 v1, v0, v6			; GFX9-NEXT: v_and_b32_e32 v1, v0, v6
	; GCN-NEXT: v_lshl_or_b32 v12, v7, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v12, v7, 16, v1
	; GCN-NEXT: v_and_b32_e32 v1, v0, v4			; GFX9-NEXT: v_and_b32_e32 v1, v0, v4
	; GCN-NEXT: v_and_b32_e32 v0, v0, v2			; GFX9-NEXT: v_and_b32_e32 v0, v0, v2
	; GCN-NEXT: v_lshl_or_b32 v11, v5, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v11, v5, 16, v1
	; GCN-NEXT: v_lshl_or_b32 v10, v3, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v10, v3, 16, v0
	; GCN-NEXT: image_sample_c_d_o v[0:1], v[8:15], s[0:7], s[8:11] dmask:0x6 a16 da			; GFX9-NEXT: image_sample_c_d_o v[0:1], v[8:15], s[0:7], s[8:11] dmask:0x6 a16 da
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: sample_c_d_o_2darray_V2:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: v_mov_b32_e32 v9, 0xffff
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v4, v9, v4
				; GFX10-NEXT: v_and_b32_e32 v2, v9, v2
				; GFX10-NEXT: v_and_b32_e32 v6, v9, v6
				; GFX10-NEXT: v_lshl_or_b32 v4, v5, 16, v4
				; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
				; GFX10-NEXT: v_lshl_or_b32 v7, v7, 16, v6
				; GFX10-NEXT: image_sample_c_d_o v[0:1], [v0, v1, v2, v4, v7, v8], s[0:7], s[8:11] dmask:0x6 dim:SQ_RSRC_IMG_2D_ARRAY a16
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f16(i32 6, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f16(i32 6, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <2 x float> %v			ret <2 x float> %v
	}			}

	declare <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <8 x float> @llvm.amdgcn.image.sample.1d.v8f32.f16(i32, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <8 x float> @llvm.amdgcn.image.sample.1d.v8f32.f16(i32, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.store.a16.d16.ll

	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
	; GCN-LABEL: {{^}}store.f16.1d:			; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s
	; GCN: image_store v[1:2], v0, s[0:7] dmask:0x1 unorm a16 d16
	define amdgpu_ps void @store.f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {			define amdgpu_ps void @store_f16_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {
				; GFX9-LABEL: store_f16_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x1 unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_f16_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.1d.v4f16.i16(<4 x half> %bitcast, i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.v4f16.i16(<4 x half> %bitcast, i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v2f16.1d:			define amdgpu_ps void @store_v2f16_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {
	; GCN: image_store v[1:2], v0, s[0:7] dmask:0x3 unorm a16 d16			; GFX9-LABEL: store_v2f16_1d:
	define amdgpu_ps void @store.v2f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x3 unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v2f16_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_1D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.1d.v4f16.i16(<4 x half> %bitcast, i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.v4f16.i16(<4 x half> %bitcast, i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v3f16.1d:			define amdgpu_ps void @store_v3f16_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {
	; GCN: image_store v[1:2], v0, s[0:7] dmask:0x7 unorm a16 d16			; GFX9-LABEL: store_v3f16_1d:
	define amdgpu_ps void @store.v3f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x7 unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v3f16_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_1D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.1d.v4f16.i16(<4 x half> %bitcast, i32 7, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.v4f16.i16(<4 x half> %bitcast, i32 7, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v4f16.1d:			define amdgpu_ps void @store_v4f16_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {
	; GCN: image_store v[1:2], v0, s[0:7] dmask:0xf unorm a16 d16			; GFX9-LABEL: store_v4f16_1d:
	define amdgpu_ps void @store.v4f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:2], v0, s[0:7] dmask:0xf unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v4f16_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:2], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.1d.v4f16.i16(<4 x half> %bitcast, i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.v4f16.i16(<4 x half> %bitcast, i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.f16.2d:			define amdgpu_ps void @store_f16_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {
	; GCN: image_store v[1:2], v0, s[0:7] dmask:0x1 unorm a16 d16			; GFX9-LABEL: store_f16_2d:
	define amdgpu_ps void @store.f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x1 unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_f16_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_2D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.2d.v4f16.i16(<4 x half> %bitcast, i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2d.v4f16.i16(<4 x half> %bitcast, i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v2f16.2d:			define amdgpu_ps void @store_v2f16_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {
	; GCN: image_store v[1:2], v0, s[0:7] dmask:0x3 unorm a16 d16			; GFX9-LABEL: store_v2f16_2d:
	define amdgpu_ps void @store.v2f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x3 unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v2f16_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_2D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.2d.v4f16.i16(<4 x half> %bitcast, i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2d.v4f16.i16(<4 x half> %bitcast, i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v3f16.2d:			define amdgpu_ps void @store_v3f16_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {
	; GCN: image_store v[1:2], v0, s[0:7] dmask:0x7 unorm a16 d16			; GFX9-LABEL: store_v3f16_2d:
	define amdgpu_ps void @store.v3f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x7 unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v3f16_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:2], v0, s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_2D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.2d.v4f16.i16(<4 x half> %bitcast, i32 7, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2d.v4f16.i16(<4 x half> %bitcast, i32 7, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v4f16.2d:			define amdgpu_ps void @store_v4f16_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {
	; GCN: image_store v[1:2], v0, s[0:7] dmask:0xf unorm a16 d16			; GFX9-LABEL: store_v4f16_2d:
	define amdgpu_ps void @store.v4f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <2 x i32> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:2], v0, s[0:7] dmask:0xf unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v4f16_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:2], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.2d.v4f16.i16(<4 x half> %bitcast, i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2d.v4f16.i16(<4 x half> %bitcast, i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.f16.3d:			define amdgpu_ps void @store_f16_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <2 x i32> %val) {
	; GCN: image_store v[2:3], v[0:1], s[0:7] dmask:0x1 unorm a16 d16			; GFX9-LABEL: store_f16_3d:
	define amdgpu_ps void @store.f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <2 x i32> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[2:3], v[0:1], s[0:7] dmask:0x1 unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_f16_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[2:3], v[0:1], s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_3D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.3d.v4f16.i16(<4 x half> %bitcast, i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.3d.v4f16.i16(<4 x half> %bitcast, i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v2f16.3d:			define amdgpu_ps void @store_v2f16_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <2 x i32> %val) {
	; GCN: image_store v[2:3], v[0:1], s[0:7] dmask:0x3 unorm a16 d16			; GFX9-LABEL: store_v2f16_3d:
	define amdgpu_ps void @store.v2f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <2 x i32> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[2:3], v[0:1], s[0:7] dmask:0x3 unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v2f16_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[2:3], v[0:1], s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_3D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.3d.v4f16.i16(<4 x half> %bitcast, i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.3d.v4f16.i16(<4 x half> %bitcast, i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v3f16.3d:			define amdgpu_ps void @store_v3f16_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <2 x i32> %val) {
	; GCN: image_store v[2:3], v[0:1], s[0:7] dmask:0x7 unorm a16 d16			; GFX9-LABEL: store_v3f16_3d:
	define amdgpu_ps void @store.v3f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <2 x i32> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[2:3], v[0:1], s[0:7] dmask:0x7 unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v3f16_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[2:3], v[0:1], s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_3D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.3d.v4f16.i16(<4 x half> %bitcast, i32 7, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.3d.v4f16.i16(<4 x half> %bitcast, i32 7, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v4f16.3d:			define amdgpu_ps void @store_v4f16_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <2 x i32> %val) {
	; GCN: image_store v[2:3], v[0:1], s[0:7] dmask:0xf unorm a16 d16			; GFX9-LABEL: store_v4f16_3d:
	define amdgpu_ps void @store.v4f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <2 x i32> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[2:3], v[0:1], s[0:7] dmask:0xf unorm a16 d16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v4f16_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[2:3], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16 d16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%bitcast = bitcast <2 x i32> %val to <4 x half>			%bitcast = bitcast <2 x i32> %val to <4 x half>
	call void @llvm.amdgcn.image.store.3d.v4f16.i16(<4 x half> %bitcast, i32 15, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.3d.v4f16.i16(<4 x half> %bitcast, i32 15, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	declare void @llvm.amdgcn.image.store.1d.v4f16.i16(<4 x half>, i32, i16, <8 x i32>, i32, i32) #2			declare void @llvm.amdgcn.image.store.1d.v4f16.i16(<4 x half>, i32, i16, <8 x i32>, i32, i32) #2
	declare void @llvm.amdgcn.image.store.2d.v4f16.i16(<4 x half>, i32, i16, i16, <8 x i32>, i32, i32) #2			declare void @llvm.amdgcn.image.store.2d.v4f16.i16(<4 x half>, i32, i16, i16, <8 x i32>, i32, i32) #2
	declare void @llvm.amdgcn.image.store.3d.v4f16.i16(<4 x half>, i32, i16, i16, i16, <8 x i32>, i32, i32) #2			declare void @llvm.amdgcn.image.store.3d.v4f16.i16(<4 x half>, i32, i16, i16, i16, <8 x i32>, i32, i32) #2

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.store.a16.ll

	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
	; GCN-LABEL: {{^}}store.f32.1d:			; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s
	; GCN: image_store v[1:4], v0, s[0:7] dmask:0x1 unorm a16
	define amdgpu_ps void @store.f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {			define amdgpu_ps void @store_f32_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {
				; GFX9-LABEL: store_f32_1d:
				; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x1 unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_f32_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %val, i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %val, i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v2f32.1d:			define amdgpu_ps void @store_v2f32_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {
	; GCN: image_store v[1:4], v0, s[0:7] dmask:0x3 unorm a16			; GFX9-LABEL: store_v2f32_1d:
	define amdgpu_ps void @store.v2f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x3 unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v2f32_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %val, i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %val, i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v3f32.1d:			define amdgpu_ps void @store_v3f32_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {
	; GCN: image_store v[1:4], v0, s[0:7] dmask:0x7 unorm a16			; GFX9-LABEL: store_v3f32_1d:
	define amdgpu_ps void @store.v3f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x7 unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v3f32_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %val, i32 7, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %val, i32 7, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v4f32.1d:			define amdgpu_ps void @store_v4f32_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {
	; GCN: image_store v[1:4], v0, s[0:7] dmask:0xf unorm a16			; GFX9-LABEL: store_v4f32_1d:
	define amdgpu_ps void @store.v4f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:4], v0, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v4f32_1d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:4], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_1D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %val, i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %val, i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.f32.2d:			define amdgpu_ps void @store_f32_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {
	; GCN: image_store v[1:4], v0, s[0:7] dmask:0x1 unorm a16			; GFX9-LABEL: store_f32_2d:
	define amdgpu_ps void @store.f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x1 unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_f32_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_2D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %val, i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %val, i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v2f32.2d:			define amdgpu_ps void @store_v2f32_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {
	; GCN: image_store v[1:4], v0, s[0:7] dmask:0x3 unorm a16			; GFX9-LABEL: store_v2f32_2d:
	define amdgpu_ps void @store.v2f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x3 unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v2f32_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_2D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %val, i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %val, i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v3f32.2d:			define amdgpu_ps void @store_v3f32_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {
	; GCN: image_store v[1:4], v0, s[0:7] dmask:0x7 unorm a16			; GFX9-LABEL: store_v3f32_2d:
	define amdgpu_ps void @store.v3f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x7 unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v3f32_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:4], v0, s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_2D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %val, i32 7, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %val, i32 7, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v4f32.2d:			define amdgpu_ps void @store_v4f32_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {
	; GCN: image_store v[1:4], v0, s[0:7] dmask:0xf unorm a16			; GFX9-LABEL: store_v4f32_2d:
	define amdgpu_ps void @store.v4f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords, <4 x float> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[1:4], v0, s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v4f32_2d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[1:4], v0, s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %val, i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %val, i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.f32.3d:			define amdgpu_ps void @store_f32_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <4 x float> %val) {
	; GCN: image_store v[2:5], v[0:1], s[0:7] dmask:0x1 unorm a16			; GFX9-LABEL: store_f32_3d:
	define amdgpu_ps void @store.f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <4 x float> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[2:5], v[0:1], s[0:7] dmask:0x1 unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_f32_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[2:5], v[0:1], s[0:7] dmask:0x1 dim:SQ_RSRC_IMG_3D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %val, i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %val, i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v2f32.3d:			define amdgpu_ps void @store_v2f32_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <4 x float> %val) {
	; GCN: image_store v[2:5], v[0:1], s[0:7] dmask:0x3 unorm a16			; GFX9-LABEL: store_v2f32_3d:
	define amdgpu_ps void @store.v2f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <4 x float> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[2:5], v[0:1], s[0:7] dmask:0x3 unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v2f32_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[2:5], v[0:1], s[0:7] dmask:0x3 dim:SQ_RSRC_IMG_3D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %val, i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %val, i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v3f32.3d:			define amdgpu_ps void @store_v3f32_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <4 x float> %val) {
	; GCN: image_store v[2:5], v[0:1], s[0:7] dmask:0x7 unorm a16			; GFX9-LABEL: store_v3f32_3d:
	define amdgpu_ps void @store.v3f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <4 x float> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[2:5], v[0:1], s[0:7] dmask:0x7 unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v3f32_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[2:5], v[0:1], s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_3D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %val, i32 7, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %val, i32 7, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store.v4f32.3d:			define amdgpu_ps void @store_v4f32_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <4 x float> %val) {
	; GCN: image_store v[2:5], v[0:1], s[0:7] dmask:0xf unorm a16			; GFX9-LABEL: store_v4f32_3d:
	define amdgpu_ps void @store.v4f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi, <4 x float> %val) {			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: image_store v[2:5], v[0:1], s[0:7] dmask:0xf unorm a16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_v4f32_3d:
				; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: image_store v[2:5], v[0:1], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_3D unorm a16
				; GFX10-NEXT: s_endpgm
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %val, i32 15, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %val, i32 15, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	declare void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float>, i32, i16, <8 x i32>, i32, i32) #2			declare void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float>, i32, i16, <8 x i32>, i32, i32) #2
	declare void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float>, i32, i16, i16, <8 x i32>, i32, i32) #2			declare void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float>, i32, i16, i16, <8 x i32>, i32, i32) #2
	declare void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float>, i32, i16, i16, i16, <8 x i32>, i32, i32) #2			declare void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float>, i32, i16, i16, i16, <8 x i32>, i32, i32) #2

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

llvm/test/CodeGen/AMDGPU/mcp-overlap-after-propagation.mir

Show All 13 Lines	body: \|
bb.0:		bb.0:
successors:		successors:
liveins: $sgpr2, $sgpr3, $sgpr96, $sgpr97, $sgpr98, $sgpr99, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr5, $vgpr70, $vgpr71		liveins: $sgpr2, $sgpr3, $sgpr96, $sgpr97, $sgpr98, $sgpr99, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr5, $vgpr70, $vgpr71

renamable $sgpr8_sgpr9 = S_GETPC_B64		renamable $sgpr8_sgpr9 = S_GETPC_B64
renamable $sgpr8 = COPY killed renamable $sgpr2		renamable $sgpr8 = COPY killed renamable $sgpr2
renamable $sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67 = S_LOAD_DWORDX8_IMM renamable $sgpr8_sgpr9, 144, 0, 0 :: (invariant load 32, align 16, addrspace 4)		renamable $sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67 = S_LOAD_DWORDX8_IMM renamable $sgpr8_sgpr9, 144, 0, 0 :: (invariant load 32, align 16, addrspace 4)
renamable $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95 = COPY killed renamable $sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67		renamable $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95 = COPY killed renamable $sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67
renamable $vgpr4 = IMAGE_GET_LOD_V1_V2_gfx10 renamable $vgpr70_vgpr71, renamable $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, renamable $sgpr96_sgpr97_sgpr98_sgpr99, 2, 1, 0, 0, 0, 0, 0, 0, 0, implicit $exec		renamable $vgpr4 = IMAGE_GET_LOD_V1_V2_gfx10 renamable $vgpr70_vgpr71, renamable $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, renamable $sgpr96_sgpr97_sgpr98_sgpr99, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
renamable $sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63 = COPY killed renamable $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95		renamable $sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63 = COPY killed renamable $sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95
renamable $vgpr12_vgpr13_vgpr14 = IMAGE_SAMPLE_V3_V2_gfx10 renamable $vgpr70_vgpr71, renamable $sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63, renamable $sgpr96_sgpr97_sgpr98_sgpr99, 7, 1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16)		renamable $vgpr12_vgpr13_vgpr14 = IMAGE_SAMPLE_V3_V2_gfx10 renamable $vgpr70_vgpr71, renamable $sgpr56_sgpr57_sgpr58_sgpr59_sgpr60_sgpr61_sgpr62_sgpr63, renamable $sgpr96_sgpr97_sgpr98_sgpr99, 7, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 12, align 16)
S_ENDPGM 0		S_ENDPGM 0

...		...

llvm/test/CodeGen/AMDGPU/nsa-vmem-hazard.mir

	# RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -run-pass post-RA-hazard-rec -o - %s \| FileCheck -check-prefix=GCN %s			# RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -run-pass post-RA-hazard-rec -o - %s \| FileCheck -check-prefix=GCN %s

	# GCN-LABEL: name: hazard_image_sample_d_buf_off6			# GCN-LABEL: name: hazard_image_sample_d_buf_off6
	# GCN: IMAGE_SAMPLE			# GCN: IMAGE_SAMPLE
	# GCN-NEXT: S_NOP 0			# GCN-NEXT: S_NOP 0
	# GCN-NEXT: BUFFER_LOAD_DWORD_OFFSET			# GCN-NEXT: BUFFER_LOAD_DWORD_OFFSET
	---			---
	name: hazard_image_sample_d_buf_off6			name: hazard_image_sample_d_buf_off6
	body: \|			body: \|
	bb.0:			bb.0:
	$vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 undef $vgpr3, undef $vgpr8, undef $vgpr7, undef $vgpr5, undef $vgpr4, undef $vgpr6, undef $vgpr0, undef $vgpr2, undef $vgpr2, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, undef $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec			$vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 undef $vgpr3, undef $vgpr8, undef $vgpr7, undef $vgpr5, undef $vgpr4, undef $vgpr6, undef $vgpr0, undef $vgpr2, undef $vgpr2, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, undef $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr1 = BUFFER_LOAD_DWORD_OFFSET undef $sgpr0_sgpr1_sgpr2_sgpr3, undef $sgpr4, 6, 0, 0, 0, 0, 0, implicit $exec			$vgpr1 = BUFFER_LOAD_DWORD_OFFSET undef $sgpr0_sgpr1_sgpr2_sgpr3, undef $sgpr4, 6, 0, 0, 0, 0, 0, implicit $exec
	...			...

	# GCN-LABEL: name: no_hazard_image_sample_d_buf_off1			# GCN-LABEL: name: no_hazard_image_sample_d_buf_off1
	# GCN: IMAGE_SAMPLE			# GCN: IMAGE_SAMPLE
	# GCN-NEXT: BUFFER_LOAD_DWORD_OFFSET			# GCN-NEXT: BUFFER_LOAD_DWORD_OFFSET
	---			---
	name: no_hazard_image_sample_d_buf_off1			name: no_hazard_image_sample_d_buf_off1
	body: \|			body: \|
	bb.0:			bb.0:
	$vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 undef $vgpr3, undef $vgpr8, undef $vgpr7, undef $vgpr5, undef $vgpr4, undef $vgpr6, undef $vgpr0, undef $vgpr2, undef $vgpr2, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, undef $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec			$vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 undef $vgpr3, undef $vgpr8, undef $vgpr7, undef $vgpr5, undef $vgpr4, undef $vgpr6, undef $vgpr0, undef $vgpr2, undef $vgpr2, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, undef $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr1 = BUFFER_LOAD_DWORD_OFFSET undef $sgpr0_sgpr1_sgpr2_sgpr3, undef $sgpr4, 1, 0, 0, 0, 0, 0, implicit $exec			$vgpr1 = BUFFER_LOAD_DWORD_OFFSET undef $sgpr0_sgpr1_sgpr2_sgpr3, undef $sgpr4, 1, 0, 0, 0, 0, 0, implicit $exec
	...			...

	# GCN-LABEL: name: no_hazard_image_sample_d_buf_far			# GCN-LABEL: name: no_hazard_image_sample_d_buf_far
	# GCN: IMAGE_SAMPLE			# GCN: IMAGE_SAMPLE
	# GCN-NEXT: V_NOP_e32			# GCN-NEXT: V_NOP_e32
	# GCN-NEXT: BUFFER_LOAD_DWORD_OFFSET			# GCN-NEXT: BUFFER_LOAD_DWORD_OFFSET
	---			---
	name: no_hazard_image_sample_d_buf_far			name: no_hazard_image_sample_d_buf_far
	body: \|			body: \|
	bb.0:			bb.0:
	$vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 undef $vgpr3, undef $vgpr8, undef $vgpr7, undef $vgpr5, undef $vgpr4, undef $vgpr6, undef $vgpr0, undef $vgpr2, undef $vgpr2, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, undef $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec			$vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_SAMPLE_D_V4_V9_nsa_gfx10 undef $vgpr3, undef $vgpr8, undef $vgpr7, undef $vgpr5, undef $vgpr4, undef $vgpr6, undef $vgpr0, undef $vgpr2, undef $vgpr2, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, undef $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
	V_NOP_e32 implicit $exec			V_NOP_e32 implicit $exec
	$vgpr1 = BUFFER_LOAD_DWORD_OFFSET undef $sgpr0_sgpr1_sgpr2_sgpr3, undef $sgpr4, 6, 0, 0, 0, 0, 0, implicit $exec			$vgpr1 = BUFFER_LOAD_DWORD_OFFSET undef $sgpr0_sgpr1_sgpr2_sgpr3, undef $sgpr4, 6, 0, 0, 0, 0, 0, implicit $exec
	...			...

	# Non-NSA			# Non-NSA
	# GCN-LABEL: name: no_hazard_image_sample_v4_v2_buf_off6			# GCN-LABEL: name: no_hazard_image_sample_v4_v2_buf_off6
	# GCN: IMAGE_SAMPLE			# GCN: IMAGE_SAMPLE
	# GCN-NEXT: BUFFER_LOAD_DWORD_OFFSET			# GCN-NEXT: BUFFER_LOAD_DWORD_OFFSET
	---			---
	name: no_hazard_image_sample_v4_v2_buf_off6			name: no_hazard_image_sample_v4_v2_buf_off6
	body: \|			body: \|
	bb.0:			bb.0:
	$vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_SAMPLE_V4_V2_gfx10 undef $vgpr1_vgpr2, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, undef $sgpr8_sgpr9_sgpr10_sgpr11, 15, 1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec			$vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_SAMPLE_V4_V2_gfx10 undef $vgpr1_vgpr2, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, undef $sgpr8_sgpr9_sgpr10_sgpr11, 15, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr1 = BUFFER_LOAD_DWORD_OFFSET undef $sgpr0_sgpr1_sgpr2_sgpr3, undef $sgpr4, 6, 0, 0, 0, 0, 0, implicit $exec			$vgpr1 = BUFFER_LOAD_DWORD_OFFSET undef $sgpr0_sgpr1_sgpr2_sgpr3, undef $sgpr4, 6, 0, 0, 0, 0, 0, implicit $exec
	...			...

	# Less than 4 dwords			# Less than 4 dwords
	# GCN-LABEL: name: no_hazard_image_sample_v4_v3_buf_off6			# GCN-LABEL: name: no_hazard_image_sample_v4_v3_buf_off6
	# GCN: IMAGE_SAMPLE			# GCN: IMAGE_SAMPLE
	# GCN-NEXT: BUFFER_LOAD_DWORD_OFFSET			# GCN-NEXT: BUFFER_LOAD_DWORD_OFFSET
	---			---
	name: no_hazard_image_sample_v4_v3_buf_off6			name: no_hazard_image_sample_v4_v3_buf_off6
	body: \|			body: \|
	bb.0:			bb.0:
	$vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_SAMPLE_V4_V3_nsa_gfx10 undef $vgpr1, undef $vgpr2, undef $vgpr0, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, undef $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec			$vgpr0_vgpr1_vgpr2_vgpr3 = IMAGE_SAMPLE_V4_V3_nsa_gfx10 undef $vgpr1, undef $vgpr2, undef $vgpr0, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, undef $sgpr8_sgpr9_sgpr10_sgpr11, 15, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr1 = BUFFER_LOAD_DWORD_OFFSET undef $sgpr0_sgpr1_sgpr2_sgpr3, undef $sgpr4, 6, 0, 0, 0, 0, 0, implicit $exec			$vgpr1 = BUFFER_LOAD_DWORD_OFFSET undef $sgpr0_sgpr1_sgpr2_sgpr3, undef $sgpr4, 6, 0, 0, 0, 0, 0, implicit $exec
	...			...

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add a16 feature to gfx10ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 242839

llvm/lib/Target/AMDGPU/AMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp

llvm/lib/Target/AMDGPU/MIMGInstructions.td

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIInstrFormats.td

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.td

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.a16.encode.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.load.a16.d16.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.load.a16.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.store.a16.d16.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.store.a16.ll

llvm/test/CodeGen/AMDGPU/mcp-overlap-after-propagation.mir

llvm/test/CodeGen/AMDGPU/nsa-vmem-hazard.mir

[AMDGPU] Add a16 feature to gfx10
ClosedPublic