This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Legalize a16 images
ClosedPublic

Authored by arsenm on Jan 26 2020, 7:58 PM.

Download Raw Diff

Details

Reviewers

nhaehnle
kerbowa

Summary

Pack the address registers in the legalizer. Avoid introducing a huge
family of new intermediate operations by filling dead operands with
noreg.

Diff Detail

Event Timeline

arsenm created this revision.Jan 26 2020, 7:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2020, 7:58 PM

Herald added subscribers: Petar.Avramovic, jfb, hiraditya and 8 others. · View Herald Transcript

arsenm added a parent revision: D73444: AMDGPU/GlobalISel: Legalize TFE image result loads.Jan 26 2020, 7:58 PM

arsenm added a child revision: D73447: AMDGPU/GlobalISel: Legalize non-a16 non-NSA images.Jan 26 2020, 8:08 PM

Why is this conceptually a legalization rather than part of the instruction selection?

In D73446#1842422, @nhaehnle wrote:

Why is this conceptually a legalization rather than part of the instruction selection?

The register layout packing/unpacking code should be exposed to the legalizer and combines. Doing it later means it could possibly be in a waterfall loop, and won't benefit from the combines on the pack/unpack code.

In D73446#1842443, @arsenm wrote:

In D73446#1842422, @nhaehnle wrote:

Why is this conceptually a legalization rather than part of the instruction selection?

The register layout packing/unpacking code should be exposed to the legalizer and combines. Doing it later means it could possibly be in a waterfall loop, and won't benefit from the combines on the pack/unpack code.

More abstractly, selection should operate on the types expected for the final instruction. It's the legalizer's job to get registers that are the right type/size

In D73446#1842457, @arsenm wrote:

In D73446#1842443, @arsenm wrote:

In D73446#1842422, @nhaehnle wrote:

Why is this conceptually a legalization rather than part of the instruction selection?

The register layout packing/unpacking code should be exposed to the legalizer and combines. Doing it later means it could possibly be in a waterfall loop, and won't benefit from the combines on the pack/unpack code.

More abstractly, selection should operate on the types expected for the final instruction. It's the legalizer's job to get registers that are the right type/size

The part that makes me uncomfortable here and I couldn't quite put a finger on initially is that it implies a change in the semantics of the intrinsic. What this and related changes are doing is implicitly changing the design such that the intrinsics mean something different before vs. after legalization. That is not what legalization is usually supposed to do.

I see your point about combiner passes -- so could we perhaps just select the image instructions early instead? It's not like we have very interesting things happening for image instructions in the ISel patterns anyway, and one of the benefits of the GlobalISel infrastructure is that it's supposed to be flexible enough for stuff like that...

In D73446#1846283, @nhaehnle wrote:

In D73446#1842457, @arsenm wrote:

In D73446#1842443, @arsenm wrote:

In D73446#1842422, @nhaehnle wrote:

Why is this conceptually a legalization rather than part of the instruction selection?

The register layout packing/unpacking code should be exposed to the legalizer and combines. Doing it later means it could possibly be in a waterfall loop, and won't benefit from the combines on the pack/unpack code.

More abstractly, selection should operate on the types expected for the final instruction. It's the legalizer's job to get registers that are the right type/size

The part that makes me uncomfortable here and I couldn't quite put a finger on initially is that it implies a change in the semantics of the intrinsic. What this and related changes are doing is implicitly changing the design such that the intrinsics mean something different before vs. after legalization. That is not what legalization is usually supposed to do.

Ideally we should have separate intermediate image opcodes. However, I have no interest in creating another giant set of image opcodes that will need more searchable tables. I don't think it's really a semantic change, and it should be possible to go backwards from the legalized intrinsic to what it was originally. What I am worried about is what happens if you run the legalizer twice, which should be OK but I'm ignoring this problem for now until I have the full selection working. Two options I've considered are re-using some of the extra bits in one of the immediate arguments to encode how it's been changed, or to use a variadic wrapper instruction which will just capture the intrinsic ID and how it was legalized.

I see your point about combiner passes -- so could we perhaps just select the image instructions early instead? It's not like we have very interesting things happening for image instructions in the ISel patterns anyway, and one of the benefits of the GlobalISel infrastructure is that it's supposed to be flexible enough for stuff like that...

I think the only reasonable place outside of selection to do this would be in RegBankSelect/applyMappingImpl, which I did consider before going with the legalizer. We still need to RegBankSelect the intrinsics, and possibly move them into a waterfall loop, and I think introducing real register class constraints earlier is generally undesirable. Eventually, we should run some combiner pass after which should take care of packing code. We could theoretically trick RegBankSelect into handling the selected instructions, but that would also be pretty disgusting. Another small issue with this is it sort of implies making the full set of legalization artifacts legal for <3 x f16> cases. I started working towards this in D72639, but I don't really like it and would prefer if these were all legalized to <4 x s16> during legalization. I think we could use the artifact combiner to eliminate the illegal register types in RegBankSelect, but I don't think that should be mandatory. We claim some of these these are legal today, but that's mostly a hack to deal with missing features in the legalizer.

arsenm added a child revision: D73666: AMDGPU/GlobalISel: Adjust image load register type based on dmask.Jan 29 2020, 2:31 PM

In D73446#1846711, @arsenm wrote:

In D73446#1846283, @nhaehnle wrote:

In D73446#1842457, @arsenm wrote:

In D73446#1842443, @arsenm wrote:

In D73446#1842422, @nhaehnle wrote:

Why is this conceptually a legalization rather than part of the instruction selection?

The register layout packing/unpacking code should be exposed to the legalizer and combines. Doing it later means it could possibly be in a waterfall loop, and won't benefit from the combines on the pack/unpack code.

More abstractly, selection should operate on the types expected for the final instruction. It's the legalizer's job to get registers that are the right type/size

The part that makes me uncomfortable here and I couldn't quite put a finger on initially is that it implies a change in the semantics of the intrinsic. What this and related changes are doing is implicitly changing the design such that the intrinsics mean something different before vs. after legalization. That is not what legalization is usually supposed to do.

Ideally we should have separate intermediate image opcodes. However, I have no interest in creating another giant set of image opcodes that will need more searchable tables. I don't think it's really a semantic change, and it should be possible to go backwards from the legalized intrinsic to what it was originally. What I am worried about is what happens if you run the legalizer twice, which should be OK but I'm ignoring this problem for now until I have the full selection working. Two options I've considered are re-using some of the extra bits in one of the immediate arguments to encode how it's been changed, or to use a variadic wrapper instruction which will just capture the intrinsic ID and how it was legalized.

Big "no" on another set of image opcodes from my side as well.

I see your point about combiner passes -- so could we perhaps just select the image instructions early instead? It's not like we have very interesting things happening for image instructions in the ISel patterns anyway, and one of the benefits of the GlobalISel infrastructure is that it's supposed to be flexible enough for stuff like that...

I think the only reasonable place outside of selection to do this would be in RegBankSelect/applyMappingImpl, which I did consider before going with the legalizer. We still need to RegBankSelect the intrinsics, and possibly move them into a waterfall loop, and I think introducing real register class constraints earlier is generally undesirable. Eventually, we should run some combiner pass after which should take care of packing code. We could theoretically trick RegBankSelect into handling the selected instructions, but that would also be pretty disgusting. Another small issue with this is it sort of implies making the full set of legalization artifacts legal for <3 x f16> cases. I started working towards this in D72639, but I don't really like it and would prefer if these were all legalized to <4 x s16> during legalization. I think we could use the artifact combiner to eliminate the illegal register types in RegBankSelect, but I don't think that should be mandatory. We claim some of these these are legal today, but that's mostly a hack to deal with missing features in the legalizer.

Part of the problem here is that the final machine instructions have register class constraints in the first place. I've been wondering for some time now what register classes even buy us in the end. It seems to me that they're largely useless, and almost everything we need from a conceptual point of view is contained in the register banks. The few complications around M0, SCC, VCC, could be dealt with explicitly since we largely shouldn't allocate them using a generic approach anyway.

Is there a way to just relax those constraints for a select subset of opcodes, like the image opcodes?

In D73446#1848869, @nhaehnle wrote:

In D73446#1846711, @arsenm wrote:

In D73446#1846283, @nhaehnle wrote:

In D73446#1842457, @arsenm wrote:

In D73446#1842443, @arsenm wrote:

In D73446#1842422, @nhaehnle wrote:

Why is this conceptually a legalization rather than part of the instruction selection?

The register layout packing/unpacking code should be exposed to the legalizer and combines. Doing it later means it could possibly be in a waterfall loop, and won't benefit from the combines on the pack/unpack code.

More abstractly, selection should operate on the types expected for the final instruction. It's the legalizer's job to get registers that are the right type/size

The part that makes me uncomfortable here and I couldn't quite put a finger on initially is that it implies a change in the semantics of the intrinsic. What this and related changes are doing is implicitly changing the design such that the intrinsics mean something different before vs. after legalization. That is not what legalization is usually supposed to do.

Ideally we should have separate intermediate image opcodes. However, I have no interest in creating another giant set of image opcodes that will need more searchable tables. I don't think it's really a semantic change, and it should be possible to go backwards from the legalized intrinsic to what it was originally. What I am worried about is what happens if you run the legalizer twice, which should be OK but I'm ignoring this problem for now until I have the full selection working. Two options I've considered are re-using some of the extra bits in one of the immediate arguments to encode how it's been changed, or to use a variadic wrapper instruction which will just capture the intrinsic ID and how it was legalized.

Big "no" on another set of image opcodes from my side as well.

I see your point about combiner passes -- so could we perhaps just select the image instructions early instead? It's not like we have very interesting things happening for image instructions in the ISel patterns anyway, and one of the benefits of the GlobalISel infrastructure is that it's supposed to be flexible enough for stuff like that...

I think the only reasonable place outside of selection to do this would be in RegBankSelect/applyMappingImpl, which I did consider before going with the legalizer. We still need to RegBankSelect the intrinsics, and possibly move them into a waterfall loop, and I think introducing real register class constraints earlier is generally undesirable. Eventually, we should run some combiner pass after which should take care of packing code. We could theoretically trick RegBankSelect into handling the selected instructions, but that would also be pretty disgusting. Another small issue with this is it sort of implies making the full set of legalization artifacts legal for <3 x f16> cases. I started working towards this in D72639, but I don't really like it and would prefer if these were all legalized to <4 x s16> during legalization. I think we could use the artifact combiner to eliminate the illegal register types in RegBankSelect, but I don't think that should be mandatory. We claim some of these these are legal today, but that's mostly a hack to deal with missing features in the legalizer.

Part of the problem here is that the final machine instructions have register class constraints in the first place. I've been wondering for some time now what register classes even buy us in the end. It seems to me that they're largely useless, and almost everything we need from a conceptual point of view is contained in the register banks. The few complications around M0, SCC, VCC, could be dealt with explicitly since we largely shouldn't allocate them using a generic approach anyway.

The register classes aren't really useless, and we do have a variety of more exotic operand constraints to deal with. We need them to represent cases like operands that don't support M0/exec/whatever, and cases like SReg_96 only supporting one 64-bit subregister. VCC is also a completely normal, allocatable register. The class constraints don't even necessarily matter when allocating, but when folding copies between classes.

Is there a way to just relax those constraints for a select subset of opcodes, like the image opcodes?

I don't know what this really would mean

I see your point about combiner passes -- so could we perhaps just select the image instructions early instead? It's not like we have very interesting things happening for image instructions in the ISel patterns anyway, and one of the benefits of the GlobalISel infrastructure is that it's supposed to be flexible enough for stuff like that...

I think the only reasonable place outside of selection to do this would be in RegBankSelect/applyMappingImpl, which I did consider before going with the legalizer. We still need to RegBankSelect the intrinsics, and possibly move them into a waterfall loop, and I think introducing real register class constraints earlier is generally undesirable. Eventually, we should run some combiner pass after which should take care of packing code. We could theoretically trick RegBankSelect into handling the selected instructions, but that would also be pretty disgusting. Another small issue with this is it sort of implies making the full set of legalization artifacts legal for <3 x f16> cases. I started working towards this in D72639, but I don't really like it and would prefer if these were all legalized to <4 x s16> during legalization. I think we could use the artifact combiner to eliminate the illegal register types in RegBankSelect, but I don't think that should be mandatory. We claim some of these these are legal today, but that's mostly a hack to deal with missing features in the legalizer.

Part of the problem here is that the final machine instructions have register class constraints in the first place. I've been wondering for some time now what register classes even buy us in the end. It seems to me that they're largely useless, and almost everything we need from a conceptual point of view is contained in the register banks. The few complications around M0, SCC, VCC, could be dealt with explicitly since we largely shouldn't allocate them using a generic approach anyway.

The register classes aren't really useless, and we do have a variety of more exotic operand constraints to deal with. We need them to represent cases like operands that don't support M0/exec/whatever, and cases like SReg_96 only supporting one 64-bit subregister. VCC is also a completely normal, allocatable register. The class constraints don't even necessarily matter when allocating, but when folding copies between classes.

Conversely, the straightjacket of register classes causes a lot of pain around image instructions and indirect register indexing. They also complicate every single instance of checking whether a register is SGPR or VGPR, which is something that we do quite a lot.

I don't think it's entirely honest to pretend that VCC is a completely normal, allocatable register. VCC use affects code size, which should be taken into account when allocating it. It's also special due to the interaction with VCCZ. Finally, IIRC the scoreboard in gfx10 treats VCC specially, which may also have implications (I haven't fully thought those through though).

The SReg_96 comment is interesting. Where do we end up with SReg_96 in the first place after legalization? I can only think of indirect indexing. Also, how is it different from SReg_128 not allowing you to take sub1_sub2 as a subregister?

Is there a way to just relax those constraints for a select subset of opcodes, like the image opcodes?

I don't know what this really would mean

The thing under discussion here from my perspective is that it's awkward to overload the semantics of image intrinsics in the way that this and related changes are doing, and the question was why we can't just directly go to the final image instructions. One aspect of this is that you'd have a non-generic machine instruction refering to register that don't have a register class, for a couple of passes at least. That doesn't seem too crazy to me.

In D73446#1853703, @nhaehnle wrote:

I see your point about combiner passes -- so could we perhaps just select the image instructions early instead? It's not like we have very interesting things happening for image instructions in the ISel patterns anyway, and one of the benefits of the GlobalISel infrastructure is that it's supposed to be flexible enough for stuff like that...

I think the only reasonable place outside of selection to do this would be in RegBankSelect/applyMappingImpl, which I did consider before going with the legalizer. We still need to RegBankSelect the intrinsics, and possibly move them into a waterfall loop, and I think introducing real register class constraints earlier is generally undesirable. Eventually, we should run some combiner pass after which should take care of packing code. We could theoretically trick RegBankSelect into handling the selected instructions, but that would also be pretty disgusting. Another small issue with this is it sort of implies making the full set of legalization artifacts legal for <3 x f16> cases. I started working towards this in D72639, but I don't really like it and would prefer if these were all legalized to <4 x s16> during legalization. I think we could use the artifact combiner to eliminate the illegal register types in RegBankSelect, but I don't think that should be mandatory. We claim some of these these are legal today, but that's mostly a hack to deal with missing features in the legalizer.

Part of the problem here is that the final machine instructions have register class constraints in the first place. I've been wondering for some time now what register classes even buy us in the end. It seems to me that they're largely useless, and almost everything we need from a conceptual point of view is contained in the register banks. The few complications around M0, SCC, VCC, could be dealt with explicitly since we largely shouldn't allocate them using a generic approach anyway.

The register classes aren't really useless, and we do have a variety of more exotic operand constraints to deal with. We need them to represent cases like operands that don't support M0/exec/whatever, and cases like SReg_96 only supporting one 64-bit subregister. VCC is also a completely normal, allocatable register. The class constraints don't even necessarily matter when allocating, but when folding copies between classes.

Conversely, the straightjacket of register classes causes a lot of pain around image instructions and indirect register indexing. They also complicate every single instance of checking whether a register is SGPR or VGPR, which is something that we do quite a lot.

I don't think it's entirely honest to pretend that VCC is a completely normal, allocatable register. VCC use affects code size, which should be taken into account when allocating it. It's also special due to the interaction with VCCZ. Finally, IIRC the scoreboard in gfx10 treats VCC specially, which may also have implications (I haven't fully thought those through though).

These aren't really constraints, and are merely optimization hints. Trying to treat VCC as different (or only as a physical register) can only penalize code in all of these situations. VCCZ is effectively an alias, and we don't try to make use of it currently. None of these issues impact instruction selection.

The SReg_96 comment is interesting. Where do we end up with SReg_96 in the first place after legalization? I can only think of indirect indexing. Also, how is it different from SReg_128 not allowing you to take sub1_sub2 as a subregister?

Copies involving physical registers, and also inline asm. It's a question of composing subregisters. If you want a sub0_sub1 of an SReg_96, you may have to copy to get a properly aligned register pair. The way we do calling convention lowering today happens to avoid this in normal cases, but I don't want to rely on this kind of behavior. We'll currently get SReg_96 from 96-bit phis, but we could also start legalizing these into 32-bit pieces. The register class constraints aren't directly relevant to this specific problem, as the main reason I want to defer selection from here in the first place is we don't even have the register bank yet. I could start directly selecting in RegBankSelect, but I don't think that's optimal either.

Is there a way to just relax those constraints for a select subset of opcodes, like the image opcodes?

I don't know what this really would mean

The thing under discussion here from my perspective is that it's awkward to overload the semantics of image intrinsics in the way that this and related changes are doing, and the question was why we can't just directly go to the final image instructions. One aspect of this is that you'd have a non-generic machine instruction refering to register that don't have a register class, for a couple of passes at least. That doesn't seem too crazy to me.

I'm leaning towards inventing what is essentially a custom G_INTRINSIC type to track the legalization of the awkward cases. The important information will still be tracked by preserving the intrinsic ID operand, but the operands will be changed as here. I think this only requires a small number of wrapper operations (I think 1, but maybe 4 at most). The current intermediate DAG nodes seem to get away with just _d16 variants for dealing with the annoying unpacked register layout case.

Rebase and use observer. I think I can put off a wrapper op a bit longer until the selection patch

Rebase test changes

Fix producing G_CONCAT_VECTORS with single source, which should probably be illegal but ends up selecting just fine

ping

Rebase

The thing under discussion here from my perspective is that it's awkward to overload the semantics of image intrinsics in the way that this and related changes are doing, and the question was why we can't just directly go to the final image instructions. One aspect of this is that you'd have a non-generic machine instruction refering to register that don't have a register class, for a couple of passes at least. That doesn't seem too crazy to me.

I'm leaning towards inventing what is essentially a custom G_INTRINSIC type to track the legalization of the awkward cases. The important information will still be tracked by preserving the intrinsic ID operand, but the operands will be changed as here. I think this only requires a small number of wrapper operations (I think 1, but maybe 4 at most). The current intermediate DAG nodes seem to get away with just _d16 variants for dealing with the annoying unpacked register layout case.

I like this idea. I can see how this could be considered a change that is separate from this change, so this one LGTM.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3126	Yes, store instructions don't support TFE to the best of my knowledge. Store instructions can still be used on images that are partially resident, but they simply become no-ops if the destination address isn't mapped.

This revision is now accepted and ready to land.Mar 17 2020, 3:46 AM

2aba9b6cf8a9f816e6be95b96c23c3c9cb692d24

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPULegalizerInfo.cpp

150 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

legalize-llvm.amdgcn.image.atomic.dim.a16.ll

1207 lines

legalize-llvm.amdgcn.image.dim.a16.ll

3301 lines

Diff 242689

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 2,971 Lines • ▼ Show 20 Lines	static void repackUnpackedD16Load(MachineIRBuilder &B, Register DstReg,
int NumOps = Unmerge->getNumOperands() - 1;		int NumOps = Unmerge->getNumOperands() - 1;
SmallVector<Register, 4> RemergeParts(NumOps);		SmallVector<Register, 4> RemergeParts(NumOps);
for (int I = 0; I != NumOps; ++I)		for (int I = 0; I != NumOps; ++I)
RemergeParts[I] = B.buildTrunc(S16, Unmerge.getReg(I)).getReg(0);		RemergeParts[I] = B.buildTrunc(S16, Unmerge.getReg(I)).getReg(0);

B.buildBuildVector(DstReg, RemergeParts);		B.buildBuildVector(DstReg, RemergeParts);
}		}

		/// Turn a set of s16 typed registers in \p A16AddrRegs into a dword sized
		/// vector with s16 typed elements.
		static void packImageA16AddressToDwords(MachineIRBuilder &B,
		MachineInstr &MI,
		SmallVectorImpl<Register> &PackedAddrs,
		int DimIdx,
		int NumVAddrs) {
		const LLT S16 = LLT::scalar(16);
		const LLT V2S16 = LLT::vector(2, 16);

		SmallVector<Register, 8> A16AddrRegs;
		A16AddrRegs.resize(NumVAddrs);

		for (int I = 0; I != NumVAddrs; ++I) {
		A16AddrRegs[I] = MI.getOperand(DimIdx + I).getReg();
		assert(B.getMRI()->getType(A16AddrRegs[I]) == S16);
		}

		// Round to dword.
		if (NumVAddrs % 2 != 0)
		A16AddrRegs.push_back(B.buildUndef(S16).getReg(0));

		PackedAddrs.resize(A16AddrRegs.size() / 2);
		for (int I = 0, E = PackedAddrs.size(); I != E; ++I) {
		PackedAddrs[I] = B.buildBuildVector(
		V2S16, {A16AddrRegs[2 * I], A16AddrRegs[2 * I + 1]}).getReg(0);
		}
		}

		// Return number of address operands in an image intrinsic.
		static int getImageNumVAddr(const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr,
		const AMDGPU::MIMGBaseOpcodeInfo *BaseOpcode) {
		const AMDGPU::MIMGDimInfo *DimInfo
		= AMDGPU::getMIMGDimInfo(ImageDimIntr->Dim);

		int NumGradients = BaseOpcode->Gradients ? DimInfo->NumGradients : 0;
		int NumCoords = BaseOpcode->Coordinates ? DimInfo->NumCoords : 0;
		int NumLCM = BaseOpcode->LodOrClampOrMip ? 1 : 0;
		return BaseOpcode->NumExtraArgs + NumGradients + NumCoords + NumLCM;
		}

		/// Return first address operand index in an image intrinsic.
		static int getImageVAddrIdxBegin(const AMDGPU::MIMGBaseOpcodeInfo *BaseOpcode,
		int NumDefs) {
		if (BaseOpcode->Atomic)
		return NumDefs + 1 + (BaseOpcode->AtomicX2 ? 2 : 1);

		int DMaskIdx = NumDefs + 1 + (BaseOpcode->Store ? 1 : 0);
		return DMaskIdx + 1;
		}

		/// Rewrite image intrinsics to use register layouts expected by the subtarget.
		///
		/// Depending on the subtarget, load/store with 16-bit element data need to be
		/// rewritten to use the low half of 32-bit registers, or directly use a packed
		/// layout. 16-bit addresses should also sometimes be packed into 32-bit
		/// registers.
		///
		/// We don't want to directly select image instructions just yet, but also want
		/// to exposes all register repacking to the legalizer/combiners. We also don't
		/// want a selected instrution entering RegBankSelect. In order to avoid
		/// defining a multitude of intermediate image instructions, directly hack on
		/// the intrinsic's arguments. In cases like a16 addreses, this requires padding
		/// now unnecessary arguments with $noreg.
bool AMDGPULegalizerInfo::legalizeImageIntrinsic(		bool AMDGPULegalizerInfo::legalizeImageIntrinsic(
MachineInstr &MI, MachineIRBuilder &B,		MachineInstr &MI, MachineIRBuilder &B,
GISelChangeObserver &Observer,		GISelChangeObserver &Observer,
const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr) const {		const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr) const {
bool IsTFE = MI.getNumExplicitDefs() == 2;		const int NumDefs = MI.getNumExplicitDefs();
		bool IsTFE = NumDefs == 2;
// We are only processing the operands of d16 image operations on subtargets		// We are only processing the operands of d16 image operations on subtargets
// that use the unpacked register layout, or need to repack the TFE result.		// that use the unpacked register layout, or need to repack the TFE result.

// TODO: Need to handle a16 images too
// TODO: Do we need to guard against already legalized intrinsics?		// TODO: Do we need to guard against already legalized intrinsics?
if (!IsTFE && !ST.hasUnpackedD16VMem())
return true;

const AMDGPU::MIMGBaseOpcodeInfo *BaseOpcode =		const AMDGPU::MIMGBaseOpcodeInfo *BaseOpcode =
AMDGPU::getMIMGBaseOpcodeInfo(ImageDimIntr->BaseOpcode);		AMDGPU::getMIMGBaseOpcodeInfo(ImageDimIntr->BaseOpcode);

if (BaseOpcode->Atomic) // No d16 atomics, or TFE.
return true;

B.setInstr(MI);		B.setInstr(MI);

MachineRegisterInfo *MRI = B.getMRI();		MachineRegisterInfo *MRI = B.getMRI();
const LLT S32 = LLT::scalar(32);		const LLT S32 = LLT::scalar(32);
const LLT S16 = LLT::scalar(16);		const LLT S16 = LLT::scalar(16);

		// Index of first address argument
		const int AddrIdx = getImageVAddrIdxBegin(BaseOpcode, NumDefs);

		// Check for 16 bit addresses and pack if true.
		int DimIdx = AddrIdx + BaseOpcode->NumExtraArgs;
		LLT AddrTy = MRI->getType(MI.getOperand(DimIdx).getReg());
		const bool IsA16 = AddrTy == S16;

		// TODO: Handle NSA vs. non-NSA for non-a16 case.

		// Rewrite the addressing register layout before doing anything else.
		if (IsA16) {
		if (!ST.hasFeature(AMDGPU::FeatureR128A16))
		return false;

		const int NumVAddrs = getImageNumVAddr(ImageDimIntr, BaseOpcode);

		// If the register allocator cannot place the address registers contiguously
		// without introducing moves, then using the non-sequential address encoding
		// is always preferable, since it saves VALU instructions and is usually a
		// wash in terms of code size or even better.
		//
		// However, we currently have no way of hinting to the register allocator
		// that MIMG addresses should be placed contiguously when it is possible to
		// do so, so force non-NSA for the common 2-address case as a heuristic.
		//
		// SIShrinkInstructions will convert NSA encodings to non-NSA after register
		// allocation when possible.
		const bool UseNSA = NumVAddrs >= 3 &&
		ST.hasFeature(AMDGPU::FeatureNSAEncoding);

		if (NumVAddrs > 1) {
		SmallVector<Register, 4> PackedRegs;
		packImageA16AddressToDwords(B, MI, PackedRegs, DimIdx, NumVAddrs);

		if (!UseNSA) {
		LLT PackedAddrTy = LLT::vector(2 * PackedRegs.size(), 16);
		auto Concat = B.buildConcatVectors(PackedAddrTy, PackedRegs);
		PackedRegs[0] = Concat.getReg(0);
		PackedRegs.resize(1);
		}

		// FIXME: We'll notify the observer multiple times if there are further
		// modifications later.
		Observer.changingInstr(MI);

		const int NumPacked = PackedRegs.size();
		for (int I = 0; I != NumVAddrs; ++I) {
		assert(MI.getOperand(DimIdx + I).getReg() != AMDGPU::NoRegister);

		if (I < NumPacked)
		MI.getOperand(DimIdx + I).setReg(PackedRegs[I]);
		else
		MI.getOperand(DimIdx + I).setReg(AMDGPU::NoRegister);
		}

		Observer.changedInstr(MI);
		}
		}

		if (BaseOpcode->Atomic) // No d16 atomics, or TFE.
		return true;

if (BaseOpcode->Store) { // No TFE for stores?		if (BaseOpcode->Store) { // No TFE for stores?
		nhaehnleUnsubmitted Not Done Reply Inline Actions Yes, store instructions don't support TFE to the best of my knowledge. Store instructions can still be used on images that are partially resident, but they simply become no-ops if the destination address isn't mapped. nhaehnle: Yes, store instructions don't support TFE to the best of my knowledge. Store instructions can…
Register VData = MI.getOperand(1).getReg();		Register VData = MI.getOperand(1).getReg();
LLT Ty = MRI->getType(VData);		LLT Ty = MRI->getType(VData);
if (!Ty.isVector() \|\| Ty.getElementType() != S16)		if (!Ty.isVector() \|\| Ty.getElementType() != S16)
return true;		return true;

B.setInstr(MI);		B.setInstr(MI);

		Register RepackedReg = handleD16VData(B, *MRI, VData);
		if (RepackedReg != VData) {
Observer.changingInstr(MI);		Observer.changingInstr(MI);
MI.getOperand(1).setReg(handleD16VData(B, *MRI, VData));		MI.getOperand(1).setReg(RepackedReg);
Observer.changedInstr(MI);		Observer.changedInstr(MI);
		}

return true;		return true;
}		}

Register DstReg = MI.getOperand(0).getReg();		Register DstReg = MI.getOperand(0).getReg();
LLT Ty = MRI->getType(DstReg);		LLT Ty = MRI->getType(DstReg);
const LLT EltTy = Ty.getScalarType();		const LLT EltTy = Ty.getScalarType();
const bool IsD16 = Ty.getScalarType() == S16;		const bool IsD16 = Ty.getScalarType() == S16;
const unsigned NumElts = Ty.isVector() ? Ty.getNumElements() : 1;		const unsigned NumElts = Ty.isVector() ? Ty.getNumElements() : 1;
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	else if (ST.hasUnpackedD16VMem())
truncToS16Vector(B, DstReg, DataPart);		truncToS16Vector(B, DstReg, DataPart);
else		else
bitcastToS16Vector(B, DstReg, DataPart);		bitcastToS16Vector(B, DstReg, DataPart);

return true;		return true;
}		}

// Must be an image load.		// Must be an image load.
if (!Ty.isVector() \|\| Ty.getElementType() != S16)		if (!ST.hasUnpackedD16VMem() \|\| !Ty.isVector() \|\| Ty.getElementType() != S16)
return true;		return true;

B.setInsertPt(*MI.getParent(), ++MI.getIterator());		B.setInsertPt(*MI.getParent(), ++MI.getIterator());

LLT WidenedTy = Ty.changeElementType(S32);		LLT WidenedTy = Ty.changeElementType(S32);
Register WideDstReg = MRI->createGenericVirtualRegister(WidenedTy);		Register WideDstReg = MRI->createGenericVirtualRegister(WidenedTy);

Observer.changingInstr(MI);		Observer.changingInstr(MI);
▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.atomic.dim.a16.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -stop-after=legalizer -o - %s \| FileCheck -check-prefix=GFX9 %s
				; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -mattr=+r128-a16 -stop-after=legalizer -o - %s \| FileCheck -check-prefix=GFX10NSA %s

				define amdgpu_ps float @atomic_swap_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_swap_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.swap.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_swap_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.swap.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.swap.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_add_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_add_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_add_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.add.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_sub_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_sub_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.sub.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_sub_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.sub.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.sub.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_smin_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_smin_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.smin.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_smin_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.smin.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.smin.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}


				define amdgpu_ps float @atomic_umin_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_umin_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.umin.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_umin_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.umin.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.umin.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_smax_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_smax_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.smax.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_smax_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.smax.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.smax.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_umax_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_umax_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.umax.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_umax_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.umax.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.umax.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_and_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_and_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.and.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_and_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.and.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.and.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_or_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_or_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.or.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_or_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.or.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.or.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_xor_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_xor_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.xor.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_xor_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.xor.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.xor.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_inc_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_inc_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.inc.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_inc_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.inc.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.inc.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_dec_1d(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_dec_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.dec.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_dec_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.dec.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.dec.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_cmpswap_1d(<8 x i32> inreg %rsrc, i32 %cmp, i32 %swap, i16 %s) {
				; GFX9-LABEL: name: atomic_cmpswap_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY10]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.cmpswap.1d), [[COPY8]](s32), [[COPY9]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_cmpswap_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY10]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.cmpswap.1d), [[COPY8]](s32), [[COPY9]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.cmpswap.1d.i32.i16(i32 %cmp, i32 %swap, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_add_2d(<8 x i32> inreg %rsrc, i32 %data, i16 %s, i16 %t) {
				; GFX9-LABEL: name: atomic_add_2d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY11]](s32), [[COPY12]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.2d), [[COPY8]](s32), [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_add_2d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY11]](s32), [[COPY12]](s32)
				; GFX10NSA: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.2d), [[COPY8]](s32), [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.add.2d.i32.i16(i32 %data, i16 %s, i16 %t, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_add_3d(<8 x i32> inreg %rsrc, i32 %data, i16 %s, i16 %t, i16 %r) {
				; GFX9-LABEL: name: atomic_add_3d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.3d), [[COPY8]](s32), [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_add_3d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.3d), [[COPY8]](s32), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.add.3d.i32.i16(i32 %data, i16 %s, i16 %t, i16 %r, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_add_cube(<8 x i32> inreg %rsrc, i32 %data, i16 %s, i16 %t, i16 %face) {
				; GFX9-LABEL: name: atomic_add_cube
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.cube), [[COPY8]](s32), [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_add_cube
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.cube), [[COPY8]](s32), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.add.cube.i32.i16(i32 %data, i16 %s, i16 %t, i16 %face, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_add_1darray(<8 x i32> inreg %rsrc, i32 %data, i16 %s, i16 %slice) {
				; GFX9-LABEL: name: atomic_add_1darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY11]](s32), [[COPY12]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.1darray), [[COPY8]](s32), [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_add_1darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY11]](s32), [[COPY12]](s32)
				; GFX10NSA: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.1darray), [[COPY8]](s32), [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.add.1darray.i32.i16(i32 %data, i16 %s, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_add_2darray(<8 x i32> inreg %rsrc, i32 %data, i16 %s, i16 %t, i16 %slice) {
				; GFX9-LABEL: name: atomic_add_2darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.2darray), [[COPY8]](s32), [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_add_2darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.2darray), [[COPY8]](s32), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.add.2darray.i32.i16(i32 %data, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_add_2dmsaa(<8 x i32> inreg %rsrc, i32 %data, i16 %s, i16 %t, i16 %fragid) {
				; GFX9-LABEL: name: atomic_add_2dmsaa
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.2dmsaa), [[COPY8]](s32), [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_add_2dmsaa
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.2dmsaa), [[COPY8]](s32), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.add.2dmsaa.i32.i16(i32 %data, i16 %s, i16 %t, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_add_2darraymsaa(<8 x i32> inreg %rsrc, i32 %data, i16 %s, i16 %t, i16 %slice, i16 %fragid) {
				; GFX9-LABEL: name: atomic_add_2darraymsaa
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr4
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[COPY12]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY15]](s32), [[COPY16]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.2darraymsaa), [[COPY8]](s32), [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_add_2darraymsaa
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr4
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY9]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[COPY12]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY15]](s32), [[COPY16]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.2darraymsaa), [[COPY8]](s32), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.add.2darraymsaa.i32.i16(i32 %data, i16 %s, i16 %t, i16 %slice, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_add_1d_slc(<8 x i32> inreg %rsrc, i32 %data, i16 %s) {
				; GFX9-LABEL: name: atomic_add_1d_slc
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 2 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_add_1d_slc
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY9]](s32)
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.add.1d), [[COPY8]](s32), [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 2 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.add.1d.i32.i16(i32 %data, i16 %s, <8 x i32> %rsrc, i32 0, i32 2)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_cmpswap_2d(<8 x i32> inreg %rsrc, i32 %cmp, i32 %swap, i16 %s, i16 %t) {
				; GFX9-LABEL: name: atomic_cmpswap_2d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.cmpswap.2d), [[COPY8]](s32), [[COPY9]](s32), [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_cmpswap_2d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX10NSA: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.cmpswap.2d), [[COPY8]](s32), [[COPY9]](s32), [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.cmpswap.2d.i32.i16(i32 %cmp, i32 %swap, i16 %s, i16 %t, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_cmpswap_3d(<8 x i32> inreg %rsrc, i32 %cmp, i32 %swap, i16 %s, i16 %t, i16 %r) {
				; GFX9-LABEL: name: atomic_cmpswap_3d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr4
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[COPY12]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY15]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.cmpswap.3d), [[COPY8]](s32), [[COPY9]](s32), [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_cmpswap_3d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr4
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[COPY12]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY15]](s32), [[DEF]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.cmpswap.3d), [[COPY8]](s32), [[COPY9]](s32), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.cmpswap.3d.i32.i16(i32 %cmp, i32 %swap, i16 %s, i16 %t, i16 %r, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				define amdgpu_ps float @atomic_cmpswap_2darraymsaa(<8 x i32> inreg %rsrc, i32 %cmp, i32 %swap, i16 %s, i16 %t, i16 %slice, i16 %fragid) {
				; GFX9-LABEL: name: atomic_cmpswap_2darraymsaa
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr4
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr5
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[COPY12]](s32)
				; GFX9: [[COPY17:%[0-9]+]]:_(s32) = COPY [[COPY13]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[COPY17]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.cmpswap.2darraymsaa), [[COPY8]](s32), [[COPY9]](s32), [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: atomic_cmpswap_2darraymsaa
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr4
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr5
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[COPY10]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[COPY11]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[COPY12]](s32)
				; GFX10NSA: [[COPY17:%[0-9]+]]:_(s32) = COPY [[COPY13]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[COPY17]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.atomic.cmpswap.2darraymsaa), [[COPY8]](s32), [[COPY9]](s32), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (volatile dereferenceable load store 4 on custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%v = call i32 @llvm.amdgcn.image.atomic.cmpswap.2darraymsaa.i32.i16(i32 %cmp, i32 %swap, i16 %s, i16 %t, i16 %slice, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
				%out = bitcast i32 %v to float
				ret float %out
				}

				declare i32 @llvm.amdgcn.image.atomic.swap.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.add.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.sub.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.smin.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.umin.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.smax.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.umax.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.and.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.or.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.xor.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.inc.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.dec.1d.i32.i16(i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.cmpswap.1d.i32.i16(i32, i32, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.add.2d.i32.i16(i32, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.add.3d.i32.i16(i32, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.add.cube.i32.i16(i32, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.add.1darray.i32.i16(i32, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.add.2darray.i32.i16(i32, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.add.2dmsaa.i32.i16(i32, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.add.2darraymsaa.i32.i16(i32, i16, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.cmpswap.2d.i32.i16(i32, i32, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.cmpswap.3d.i32.i16(i32, i32, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.cmpswap.cube.i32.i16(i32, i32, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.cmpswap.1darray.i32.i16(i32, i32, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.cmpswap.2darray.i32.i16(i32, i32, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.cmpswap.2dmsaa.i32.i16(i32, i32, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0
				declare i32 @llvm.amdgcn.image.atomic.cmpswap.2darraymsaa.i32.i16(i32, i32, i16, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #0

				attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.dim.a16.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -stop-after=legalizer -o - %s \| FileCheck -check-prefix=GFX9 %s
				; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -mattr=+r128-a16 -stop-after=legalizer -o - %s \| FileCheck -check-prefix=GFX10NSA %s

				define amdgpu_ps <4 x float> @load_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: load_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: load_2d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY9]](s32), [[COPY10]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2d), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_2d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY9]](s32), [[COPY10]](s32)
				; GFX10NSA: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2d), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%t = extractelement <2 x i16> %coords, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 %s, i16 %t, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_3d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.3d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_3d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.3d), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%r = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %r, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_cube
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.cube), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_cube
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.cube), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: load_1darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY9]](s32), [[COPY10]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1darray), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_1darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY9]](s32), [[COPY10]](s32)
				; GFX10NSA: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1darray), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%slice = extractelement <2 x i16> %coords, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i16(i32 15, i16 %s, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_2darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2darray), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_2darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2darray), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_2dmsaa
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2dmsaa), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_2dmsaa
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2dmsaa), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%fragid = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.2dmsaa.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_2darraymsaa
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2darraymsaa), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_2darraymsaa
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2darraymsaa), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%fragid = extractelement <2 x i16> %coords_hi, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.2darraymsaa.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: load_mip_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY9]](s32), [[COPY10]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.1d), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_mip_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY9]](s32), [[COPY10]](s32)
				; GFX10NSA: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.1d), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%mip = extractelement <2 x i16> %coords, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i16(i32 15, i16 %s, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_mip_2d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.2d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_mip_2d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.2d), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%mip = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_mip_3d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.3d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_mip_3d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.3d), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%r = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %r, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_mip_cube
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.cube), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_mip_cube
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.cube), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.cube.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_mip_1darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.1darray), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_mip_1darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.1darray), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%slice = extractelement <2 x i16> %coords_lo, i32 1
				%mip = extractelement <2 x i16> %coords_hi, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.1darray.v4f32.i16(i32 15, i16 %s, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_mip_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_mip_2darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.2darray), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_mip_2darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.mip.2darray), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				%v = call <4 x float> @llvm.amdgcn.image.load.mip.2darray.v4f32.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps void @store_1d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: name: store_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_2d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: name: store_2d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_2d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
				; GFX10NSA: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%t = extractelement <2 x i16> %coords, i32 1
				call void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_3d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: store_3d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.3d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_3d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.3d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%r = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %r, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_cube(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: store_cube
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.cube), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_cube
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.cube), [[BUILD_VECTOR1]](<4 x s32>), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.cube.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_1darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: name: store_1darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1darray), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_1darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
				; GFX10NSA: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1darray), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%slice = extractelement <2 x i16> %coords, i32 1
				call void @llvm.amdgcn.image.store.1darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_2darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: store_2darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2darray), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_2darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2darray), [[BUILD_VECTOR1]](<4 x s32>), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.2darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_2dmsaa(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: store_2dmsaa
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2dmsaa), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_2dmsaa
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2dmsaa), [[BUILD_VECTOR1]](<4 x s32>), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%fragid = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.2dmsaa.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_2darraymsaa(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: store_2darraymsaa
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[COPY17:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[COPY17]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2darraymsaa), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_2darraymsaa
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[COPY17:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[COPY17]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2darraymsaa), [[BUILD_VECTOR1]](<4 x s32>), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%fragid = extractelement <2 x i16> %coords_hi, i32 1
				call void @llvm.amdgcn.image.store.2darraymsaa.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_1d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: name: store_mip_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.1d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_mip_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
				; GFX10NSA: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.1d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%mip = extractelement <2 x i16> %coords, i32 1
				call void @llvm.amdgcn.image.store.mip.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_2d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: store_mip_2d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.2d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_mip_2d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.2d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%mip = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.mip.2d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_3d(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: store_mip_3d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[COPY17:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[COPY17]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.3d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_mip_3d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[COPY17:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[COPY17]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.3d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%r = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				call void @llvm.amdgcn.image.store.mip.3d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %r, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_cube(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: store_mip_cube
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[COPY17:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[COPY17]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.cube), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_mip_cube
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[COPY17:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[COPY17]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.cube), [[BUILD_VECTOR1]](<4 x s32>), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				call void @llvm.amdgcn.image.store.mip.cube.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_1darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: store_mip_1darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.1darray), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_mip_1darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.1darray), [[BUILD_VECTOR1]](<4 x s32>), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%slice = extractelement <2 x i16> %coords_lo, i32 1
				%mip = extractelement <2 x i16> %coords_hi, i32 0
				call void @llvm.amdgcn.image.store.mip.1darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_mip_2darray(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: store_mip_2darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX9: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[COPY17:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[COPY17]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.2darray), [[BUILD_VECTOR1]](<4 x s32>), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_mip_2darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr5
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY13]](<2 x s16>)
				; GFX10NSA: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX10NSA: [[COPY14:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY15:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
				; GFX10NSA: [[COPY16:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[COPY17:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[COPY17]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.mip.2darray), [[BUILD_VECTOR1]](<4 x s32>), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%mip = extractelement <2 x i16> %coords_hi, i32 1
				call void @llvm.amdgcn.image.store.mip.2darray.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, i16 %t, i16 %slice, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps <4 x float> @getresinfo_1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: getresinfo_1d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: getresinfo_1d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.1d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: getresinfo_2d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.2d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: getresinfo_2d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.2d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_3d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: getresinfo_3d
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.3d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: getresinfo_3d
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.3d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.3d.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_cube(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: getresinfo_cube
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.cube), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: getresinfo_cube
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.cube), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.cube.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_1darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: getresinfo_1darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.1darray), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: getresinfo_1darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.1darray), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.1darray.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_2darray(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: getresinfo_2darray
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.2darray), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: getresinfo_2darray
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.2darray), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2darray.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_2dmsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: getresinfo_2dmsaa
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.2dmsaa), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: getresinfo_2dmsaa
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.2dmsaa), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2dmsaa.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @getresinfo_2darraymsaa(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: getresinfo_2darraymsaa
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.2darraymsaa), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: getresinfo_2darraymsaa
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.2darraymsaa), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.getresinfo.2darraymsaa.v4f32.i16(i32 15, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %v
				}

				define amdgpu_ps float @load_1d_V1(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: load_1d_V1
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 8, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 4 from custom "TargetCustom8")
				; GFX9: $vgpr0 = COPY [[INT]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GFX10NSA-LABEL: name: load_1d_V1
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 8, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 4 from custom "TargetCustom8")
				; GFX10NSA: $vgpr0 = COPY [[INT]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call float @llvm.amdgcn.image.load.1d.f32.i16(i32 8, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret float %v
				}

				define amdgpu_ps <2 x float> @load_1d_V2(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: load_1d_V2
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<2 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 9, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 8 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<2 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1
				; GFX10NSA-LABEL: name: load_1d_V2
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<2 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 9, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable load 8 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<2 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call <2 x float> @llvm.amdgcn.image.load.1d.v2f32.i16(i32 9, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret <2 x float> %v
				}

				define amdgpu_ps void @store_1d_V1(<8 x i32> inreg %rsrc, float %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: name: store_1d_V1
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[COPY8]](s32), 2, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 4 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_1d_V1
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[COPY8]](s32), 2, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 4 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.f32.i16(float %vdata, i32 2, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps void @store_1d_V2(<8 x i32> inreg %rsrc, <2 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: name: store_1d_V2
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr2
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY10]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[BUILD_VECTOR1]](<2 x s32>), 12, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 8 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_1d_V2
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr2
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY10]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[BUILD_VECTOR1]](<2 x s32>), 12, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0 :: (dereferenceable store 8 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.v2f32.i16(<2 x float> %vdata, i32 12, i16 %s, <8 x i32> %rsrc, i32 0, i32 0)
				ret void
				}

				define amdgpu_ps <4 x float> @load_1d_glc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: load_1d_glc
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 1 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_1d_glc
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 1 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 1)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_1d_slc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: load_1d_slc
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 2 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_1d_slc
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 2 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 2)
				ret <4 x float> %v
				}

				define amdgpu_ps <4 x float> @load_1d_glc_slc(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: load_1d_glc_slc
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 3 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_1d_glc_slc
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 3 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 3)
				ret <4 x float> %v
				}

				define amdgpu_ps void @store_1d_glc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: name: store_1d_glc
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 1 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_1d_glc
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 1 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 1)
				ret void
				}

				define amdgpu_ps void @store_1d_slc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: name: store_1d_slc
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 2 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_1d_slc
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 2 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 2)
				ret void
				}

				define amdgpu_ps void @store_1d_glc_slc(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: name: store_1d_glc_slc
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 3 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX9: S_ENDPGM 0
				; GFX10NSA-LABEL: name: store_1d_glc_slc
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr4
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY12]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.1d), [[BUILD_VECTOR1]](<4 x s32>), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 3 :: (dereferenceable store 16 into custom "TargetCustom8")
				; GFX10NSA: S_ENDPGM 0
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				call void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float> %vdata, i32 15, i16 %s, <8 x i32> %rsrc, i32 0, i32 3)
				ret void
				}

				define amdgpu_ps <4 x float> @getresinfo_dmask0(<8 x i32> inreg %rsrc, <4 x float> %vdata, <2 x i16> %coords) {
				; GFX9-LABEL: name: getresinfo_dmask0
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.1d), 0, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: getresinfo_dmask0
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.image.getresinfo.1d), 0, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 0, 0
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<4 x s32>)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%mip = extractelement <2 x i16> %coords, i32 0
				%r = call <4 x float> @llvm.amdgcn.image.getresinfo.1d.v4f32.i16(i32 0, i16 %mip, <8 x i32> %rsrc, i32 0, i32 0)
				ret <4 x float> %r
				}

				define amdgpu_ps <4 x float> @load_1d_tfe(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: load_1d_tfe
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX9: [[INT:%[0-9]+]]:_(<5 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 1, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<5 x s32>)
				; GFX9: G_STORE [[UV4]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_1d_tfe
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[BITCAST]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<5 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.1d), 15, [[TRUNC]](s16), [[BUILD_VECTOR]](<8 x s32>), 1, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<5 x s32>)
				; GFX10NSA: G_STORE [[UV4]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%v = call { <4 x float>, i32 } @llvm.amdgcn.image.load.1d.sl_v4f32i32s.i16(i32 15, i16 %s, <8 x i32> %rsrc, i32 1, i32 0)
				%data = extractvalue { <4 x float>, i32 } %v, 0
				%tfe = extractvalue { <4 x float>, i32 } %v, 1
				store i32 %tfe, i32 addrspace(1)* undef
				ret <4 x float> %data
				}

				define amdgpu_ps <4 x float> @load_2d_tfe(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
				; GFX9-LABEL: name: load_2d_tfe
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY9]](s32), [[COPY10]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<5 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2d), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 1, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<5 x s32>)
				; GFX9: G_STORE [[UV4]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_2d_tfe
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY9]](s32), [[COPY10]](s32)
				; GFX10NSA: [[CONCAT_VECTORS:%[0-9]+]]:_(<2 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<5 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2d), 15, [[CONCAT_VECTORS]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 1, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<5 x s32>)
				; GFX10NSA: G_STORE [[UV4]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords, i32 0
				%t = extractelement <2 x i16> %coords, i32 1
				%v = call { <4 x float>, i32 } @llvm.amdgcn.image.load.2d.sl_v4f32i32s.i16(i32 15, i16 %s, i16 %t, <8 x i32> %rsrc, i32 1, i32 0)
				%data = extractvalue { <4 x float>, i32 } %v, 0
				%tfe = extractvalue { <4 x float>, i32 } %v, 1
				store i32 %tfe, i32 addrspace(1)* undef
				ret <4 x float> %data
				}

				define amdgpu_ps <4 x float> @load_3d_tfe(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_3d_tfe
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[DEF1:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF1]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<5 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.3d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 1, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<5 x s32>)
				; GFX9: G_STORE [[UV4]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_3d_tfe
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[DEF1:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF1]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<5 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.3d), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), 1, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<5 x s32>)
				; GFX10NSA: G_STORE [[UV4]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%r = extractelement <2 x i16> %coords_hi, i32 0
				%v = call { <4 x float>, i32 } @llvm.amdgcn.image.load.3d.sl_v4f32i32s.i16(i32 15, i16 %s, i16 %t, i16 %r, <8 x i32> %rsrc, i32 1, i32 0)
				%data = extractvalue { <4 x float>, i32 } %v, 0
				%tfe = extractvalue { <4 x float>, i32 } %v, 1
				store i32 %tfe, i32 addrspace(1)* undef
				ret <4 x float> %data
				}

				define amdgpu_ps <4 x float> @load_2darraymsaa_tfe(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
				; GFX9-LABEL: name: load_2darraymsaa_tfe
				; GFX9: bb.1.main_body:
				; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX9: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX9: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX9: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX9: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
				; GFX9: [[INT:%[0-9]+]]:_(<5 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2darraymsaa), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 1, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<5 x s32>)
				; GFX9: G_STORE [[UV4]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1)
				; GFX9: $vgpr0 = COPY [[UV]](s32)
				; GFX9: $vgpr1 = COPY [[UV1]](s32)
				; GFX9: $vgpr2 = COPY [[UV2]](s32)
				; GFX9: $vgpr3 = COPY [[UV3]](s32)
				; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				; GFX10NSA-LABEL: name: load_2darraymsaa_tfe
				; GFX10NSA: bb.1.main_body:
				; GFX10NSA: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $vgpr0, $vgpr1
				; GFX10NSA: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
				; GFX10NSA: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
				; GFX10NSA: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
				; GFX10NSA: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
				; GFX10NSA: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
				; GFX10NSA: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
				; GFX10NSA: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
				; GFX10NSA: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
				; GFX10NSA: [[COPY8:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr0
				; GFX10NSA: [[COPY9:%[0-9]+]]:_(<2 x s16>) = COPY $vgpr1
				; GFX10NSA: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
				; GFX10NSA: [[DEF:%[0-9]+]]:_(p1) = G_IMPLICIT_DEF
				; GFX10NSA: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[COPY8]](<2 x s16>)
				; GFX10NSA: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX10NSA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C]](s32)
				; GFX10NSA: [[BITCAST2:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY9]](<2 x s16>)
				; GFX10NSA: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
				; GFX10NSA: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX10NSA: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY10]](s32), [[COPY11]](s32)
				; GFX10NSA: [[COPY12:%[0-9]+]]:_(s32) = COPY [[BITCAST2]](s32)
				; GFX10NSA: [[COPY13:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX10NSA: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[COPY13]](s32)
				; GFX10NSA: [[INT:%[0-9]+]]:_(<5 x s32>) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.load.2darraymsaa), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), 1, 0 :: (dereferenceable load 16 from custom "TargetCustom8")
				; GFX10NSA: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](<5 x s32>)
				; GFX10NSA: G_STORE [[UV4]](s32), [[DEF]](p1) :: (store 4 into `i32 addrspace(1)* undef`, addrspace 1)
				; GFX10NSA: $vgpr0 = COPY [[UV]](s32)
				; GFX10NSA: $vgpr1 = COPY [[UV1]](s32)
				; GFX10NSA: $vgpr2 = COPY [[UV2]](s32)
				; GFX10NSA: $vgpr3 = COPY [[UV3]](s32)
				; GFX10NSA: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
				main_body:
				%s = extractelement <2 x i16> %coords_lo, i32 0
				%t = extractelement <2 x i16> %coords_lo, i32 1
				%slice = extractelement <2 x i16> %coords_hi, i32 0
				%fragid = extractelement <2 x i16> %coords_hi, i32 1
				%v = call { <4 x float>, i32 } @llvm.amdgcn.image.load.2darraymsaa.sl_v4f32i32s.i16(i32 15, i16 %s, i16 %t, i16 %slice, i16 %fragid, <8 x i32> %rsrc, i32 1, i32 0)
				%data = extractvalue { <4 x float>, i32 } %v, 0
				%tfe = extractvalue { <4 x float>, i32 } %v, 1
				store i32 %tfe, i32 addrspace(1)* undef
				ret <4 x float> %data
				}

				declare <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 immarg, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i16(i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i16(i32 immarg, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i16(i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.2dmsaa.v4f32.i16(i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.2darraymsaa.v4f32.i16(i32 immarg, i16, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i16(i32 immarg, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i16(i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i16(i32 immarg, i16, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.cube.v4f32.i16(i32 immarg, i16, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.1darray.v4f32.i16(i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.2darray.v4f32.i16(i32 immarg, i16, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare void @llvm.amdgcn.image.store.1d.v4f32.i16(<4 x float>, i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.2d.v4f32.i16(<4 x float>, i32 immarg, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.3d.v4f32.i16(<4 x float>, i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.cube.v4f32.i16(<4 x float>, i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.1darray.v4f32.i16(<4 x float>, i32 immarg, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.2darray.v4f32.i16(<4 x float>, i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.2dmsaa.v4f32.i16(<4 x float>, i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.2darraymsaa.v4f32.i16(<4 x float>, i32 immarg, i16, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.mip.1d.v4f32.i16(<4 x float>, i32 immarg, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.mip.2d.v4f32.i16(<4 x float>, i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.mip.3d.v4f32.i16(<4 x float>, i32 immarg, i16, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.mip.cube.v4f32.i16(<4 x float>, i32 immarg, i16, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.mip.1darray.v4f32.i16(<4 x float>, i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.mip.2darray.v4f32.i16(<4 x float>, i32 immarg, i16, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare <4 x float> @llvm.amdgcn.image.getresinfo.1d.v4f32.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #3
				declare <4 x float> @llvm.amdgcn.image.getresinfo.2d.v4f32.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #3
				declare <4 x float> @llvm.amdgcn.image.getresinfo.3d.v4f32.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #3
				declare <4 x float> @llvm.amdgcn.image.getresinfo.cube.v4f32.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #3
				declare <4 x float> @llvm.amdgcn.image.getresinfo.1darray.v4f32.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #3
				declare <4 x float> @llvm.amdgcn.image.getresinfo.2darray.v4f32.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #3
				declare <4 x float> @llvm.amdgcn.image.getresinfo.2dmsaa.v4f32.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #3
				declare <4 x float> @llvm.amdgcn.image.getresinfo.2darraymsaa.v4f32.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #3
				declare float @llvm.amdgcn.image.load.1d.f32.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare float @llvm.amdgcn.image.load.2d.f32.i16(i32 immarg, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare <2 x float> @llvm.amdgcn.image.load.1d.v2f32.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare void @llvm.amdgcn.image.store.1d.f32.i16(float, i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare void @llvm.amdgcn.image.store.1d.v2f32.i16(<2 x float>, i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #2
				declare { <4 x float>, i32 } @llvm.amdgcn.image.load.1d.sl_v4f32i32s.i16(i32 immarg, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare { <4 x float>, i32 } @llvm.amdgcn.image.load.2d.sl_v4f32i32s.i16(i32 immarg, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare { <4 x float>, i32 } @llvm.amdgcn.image.load.3d.sl_v4f32i32s.i16(i32 immarg, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1
				declare { <4 x float>, i32 } @llvm.amdgcn.image.load.2darraymsaa.sl_v4f32i32s.i16(i32 immarg, i16, i16, i16, i16, <8 x i32>, i32 immarg, i32 immarg) #1

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readonly }
				attributes #2 = { nounwind writeonly }
				attributes #3 = { nounwind readnone }