This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fixes for 'LOD bias' operand in ISelDAG path and GobalISel path when A16-bit is 'ON'
ClosedPublic

Authored by Ravi on Oct 13 2021, 12:21 PM.

Download Raw Diff

Details

Reviewers

nhaehnle
arsenm
cdevadas
critson
rampitec
dstuttard
sameerds

Commits

rGc680fb69d6ae: [AMDGPU] Fixes in ISelDAG path and GlobalISel path for 'bias' operand with A16…

Summary

Background: https://reviews.llvm.org/D74314

The LOD bias operand is of type 'half' when the A16-bit is ON' for MIMG instructions. 'bias' is only 16-bit but occupies 32-bits with upper 16-bits containing junk. The patch fixes both the paths(ISelDAG and GlobalISel) for proper encoding of LOD bias operand. The fix could have been done in one of 2 approaches.

Approaches:

First approach is to add 'bias' operand index for each MIMG instruction in the table-gen generated image intrinsic info to later identify the 'bias' operand with it and fix it as 2 packed 16-bit operands with the upper 16-bit being undefined.
The 'bias' operand is the only operand in the MIMG intrinsics that would be a 16-bit incoming operand below the index for gradients. The other image address operands 'offset' and 'z-compare' that come before gradients are always 32-bit irrespective of the A16-bit. The patch implements this logic.

Testing:
Multiple tests with A16 ON and OFF in both GlobalISel and ISelDAG path are checked and updated. Especially the SAMPLE and GATHER4 instruction tests. Few tests with
SAMPLE were missing in both the paths with A16 'OFF'. But the GATHER4 tests have covered this case. So no additional tests are added.
All the lit tests have passed.

Observations:

The ISelDAG path generates the same code as earlier without any inefficiency. But the GlobalISel path adds explicit instructions to fill the upper 16-bit with junk. This has to be probably analysed as a new optimization issue and identify the path that's introducing them.
Occuring with earlier code as well as with theis patch. The image resource constant(a group of 8 registers) are being copied from set of contiguous registers to another set of contiguous registers to take care of alignment. These copies can be avoided with a custom lowering of formal arguments. And the specific register info could be reported back to the driver/encoded in the code objects.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Ravi created this revision.Oct 13 2021, 12:21 PM

Herald added subscribers: foad, wenlei, kerbowa and 2 others. · View Herald TranscriptOct 13 2021, 12:21 PM

Ravi requested review of this revision.Oct 13 2021, 12:21 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2021, 12:21 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B128697: Diff 379480.Oct 13 2021, 1:11 PM

Ravi retitled this revision from Fixes for 'LOD bias' operand in ISelDAG path and GobalISel path when A16-bit is 'ON' to AMDGPU: Fixes for 'LOD bias' operand in ISelDAG path and GobalISel path when A16-bit is 'ON'.Oct 13 2021, 10:07 PM

Ravi edited the summary of this revision. (Show Details)

Ravi added a reviewer: sameerds.

Herald added subscribers: t-tye, tpr, yaxunl, kzhuravl. · View Herald TranscriptOct 13 2021, 10:07 PM

This also needs fixing in the combiner (which was introduced in D85887). I.e. checking that the bias can be losslessly converted to f16 and converting it when converting the address.
(I guess this doesn’t need to be part of this change.)

LGTM although I'm not the most familiar person with images

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4242	Capitalize
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
6286	Capitalize

This revision is now accepted and ready to land.Oct 18 2021, 2:26 PM

foad added inline comments.Oct 19 2021, 1:50 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.sample.a16.ll
9	Why has the indentation changed in almost every test?

But the GlobalISel path adds explicit instructions to fill the upper 16-bit with junk.

I think D112064 should clean this up.

@Ravi are you still working on this?

In D111754#3137210, @foad wrote:

@Ravi are you still working on this?

I don't have commit permissions, so checked with @cdevadas and we decided to check-in after D112064 goes in. The suggestions from @arsenm are taken care and the indent changes have also been removed.

we decided to check-in after D112064 goes in.

OK. I have updated D112064.

fixed comments and rebased with latest main.

Harbormaster completed remote builds in B139798: Diff 395065.Dec 17 2021, 1:56 AM

This revision was landed with ongoing or failed builds.Dec 17 2021, 2:43 AM

Closed by commit rGc680fb69d6ae: [AMDGPU] Fixes in ISelDAG path and GlobalISel path for 'bias' operand with A16… (authored by Ravi). · Explain Why

This revision was automatically updated to reflect the committed changes.

Ravi added a commit: rGc680fb69d6ae: [AMDGPU] Fixes in ISelDAG path and GlobalISel path for 'bias' operand with A16….

sebastian-ne mentioned this in D116038: [AMDGPU] Fix LOD bias in A16 combine.Dec 20 2021, 6:16 AM

sebastian-ne mentioned this in rG0530fdbbbb84: [AMDGPU] Fix LOD bias in A16 combine.Jan 21 2022, 3:09 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPULegalizerInfo.cpp

13 lines

SIISelLowering.cpp

15 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

legalize-llvm.amdgcn.image.sample.a16.ll

176 lines

llvm.amdgcn.image.gather4.a16.dim.ll

86 lines

llvm.amdgcn.image.gather4.a16.dim.ll

24 lines

llvm.amdgcn.image.sample.a16.dim.ll

48 lines

Diff 395082

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 4,231 Lines • ▼ Show 20 Lines	if (!SrcOp.isReg())
continue; // _L to _LZ may have eliminated this.		continue; // _L to _LZ may have eliminated this.

Register AddrReg = SrcOp.getReg();		Register AddrReg = SrcOp.getReg();

if ((I < Intr->GradientStart) \|\|		if ((I < Intr->GradientStart) \|\|
(I >= Intr->GradientStart && I < Intr->CoordStart && !IsG16) \|\|		(I >= Intr->GradientStart && I < Intr->CoordStart && !IsG16) \|\|
(I >= Intr->CoordStart && !IsA16)) {		(I >= Intr->CoordStart && !IsA16)) {
// Handle any gradient or coordinate operands that should not be packed		// Handle any gradient or coordinate operands that should not be packed
		if ((I < Intr->GradientStart) && IsA16 &&
		(B.getMRI()->getType(AddrReg) == S16)) {
		// Special handling of bias when A16 is on. Bias is of type half but
		arsenmUnsubmitted Done Reply Inline Actions Capitalize arsenm: Capitalize
		// occupies full 32-bit.
		PackedAddrs.push_back(
		B.buildBuildVector(V2S16, {AddrReg, B.buildUndef(S16).getReg(0)})
		.getReg(0));
		} else {
AddrReg = B.buildBitcast(V2S16, AddrReg).getReg(0);		AddrReg = B.buildBitcast(V2S16, AddrReg).getReg(0);
PackedAddrs.push_back(AddrReg);		PackedAddrs.push_back(AddrReg);
		}
} else {		} else {
// Dz/dh, dz/dv and the last odd coord are packed with undef. Also, in 1D,		// Dz/dh, dz/dv and the last odd coord are packed with undef. Also, in 1D,
// derivatives dx/dh and dx/dv are packed with undef.		// derivatives dx/dh and dx/dv are packed with undef.
if (((I + 1) >= EndIdx) \|\|		if (((I + 1) >= EndIdx) \|\|
((Intr->NumGradients / 2) % 2 == 1 &&		((Intr->NumGradients / 2) % 2 == 1 &&
(I == static_cast<unsigned>(Intr->GradientStart +		(I == static_cast<unsigned>(Intr->GradientStart +
(Intr->NumGradients / 2) - 1) \|\|		(Intr->NumGradients / 2) - 1) \|\|
I == static_cast<unsigned>(Intr->GradientStart +		I == static_cast<unsigned>(Intr->GradientStart +
▲ Show 20 Lines • Show All 894 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,262 Lines • ▼ Show 20 Lines	if (auto *ConstantLod = dyn_cast<ConstantSDNode>(
Op.getOperand(ArgOffset + Intr->MipIndex))) {		Op.getOperand(ArgOffset + Intr->MipIndex))) {
if (ConstantLod->isZero()) {		if (ConstantLod->isZero()) {
IntrOpcode = MIPMappingInfo->NONMIP; // set new opcode to variant without _mip		IntrOpcode = MIPMappingInfo->NONMIP; // set new opcode to variant without _mip
VAddrEnd--; // remove 'mip'		VAddrEnd--; // remove 'mip'
}		}
}		}
}		}

// Push back extra arguments.
for (unsigned I = Intr->VAddrStart; I < Intr->GradientStart; I++)
VAddrs.push_back(Op.getOperand(ArgOffset + I));

// Check for 16 bit addresses or derivatives and pack if true.		// Check for 16 bit addresses or derivatives and pack if true.
MVT VAddrVT =		MVT VAddrVT =
Op.getOperand(ArgOffset + Intr->GradientStart).getSimpleValueType();		Op.getOperand(ArgOffset + Intr->GradientStart).getSimpleValueType();
MVT VAddrScalarVT = VAddrVT.getScalarType();		MVT VAddrScalarVT = VAddrVT.getScalarType();
MVT GradPackVectorVT = VAddrScalarVT == MVT::f16 ? MVT::v2f16 : MVT::v2i16;		MVT GradPackVectorVT = VAddrScalarVT == MVT::f16 ? MVT::v2f16 : MVT::v2i16;
IsG16 = VAddrScalarVT == MVT::f16 \|\| VAddrScalarVT == MVT::i16;		IsG16 = VAddrScalarVT == MVT::f16 \|\| VAddrScalarVT == MVT::i16;

VAddrVT = Op.getOperand(ArgOffset + Intr->CoordStart).getSimpleValueType();		VAddrVT = Op.getOperand(ArgOffset + Intr->CoordStart).getSimpleValueType();
VAddrScalarVT = VAddrVT.getScalarType();		VAddrScalarVT = VAddrVT.getScalarType();
MVT AddrPackVectorVT = VAddrScalarVT == MVT::f16 ? MVT::v2f16 : MVT::v2i16;		MVT AddrPackVectorVT = VAddrScalarVT == MVT::f16 ? MVT::v2f16 : MVT::v2i16;
IsA16 = VAddrScalarVT == MVT::f16 \|\| VAddrScalarVT == MVT::i16;		IsA16 = VAddrScalarVT == MVT::f16 \|\| VAddrScalarVT == MVT::i16;

		// Push back extra arguments.
		for (unsigned I = Intr->VAddrStart; I < Intr->GradientStart; I++) {
		if (IsA16 && (Op.getOperand(ArgOffset + I).getValueType() == MVT::f16)) {
		// Special handling of bias when A16 is on. Bias is of type half but
		arsenmUnsubmitted Done Reply Inline Actions Capitalize arsenm: Capitalize
		// occupies full 32-bit.
		SDValue bias = DAG.getBuildVector( MVT::v2f16, DL, {Op.getOperand(ArgOffset + I), DAG.getUNDEF(MVT::f16)});
		VAddrs.push_back(bias);
		} else
		VAddrs.push_back(Op.getOperand(ArgOffset + I));
		}

if (BaseOpcode->Gradients && !ST->hasG16() && (IsA16 != IsG16)) {		if (BaseOpcode->Gradients && !ST->hasG16() && (IsA16 != IsG16)) {
// 16 bit gradients are supported, but are tied to the A16 control		// 16 bit gradients are supported, but are tied to the A16 control
// so both gradients and addresses must be 16 bit		// so both gradients and addresses must be 16 bit
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "Failed to lower image intrinsic: 16 bit addresses "		dbgs() << "Failed to lower image intrinsic: 16 bit addresses "
"require 16 bit args for both gradients and addresses");		"require 16 bit args for both gradients and addresses");
return Op;		return Op;
}		}
▲ Show 20 Lines • Show All 6,176 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.sample.a16.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -stop-after=legalizer -o - %s \| FileCheck -check-prefix=GFX9 %s		; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -stop-after=legalizer -o - %s \| FileCheck -check-prefix=GFX9 %s
; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -stop-after=legalizer -o - %s \| FileCheck -check-prefix=GFX10 %s		; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -stop-after=legalizer -o - %s \| FileCheck -check-prefix=GFX10 %s

define amdgpu_ps <4 x float> @sample_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {		define amdgpu_ps <4 x float> @sample_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {
; GFX9-LABEL: name: sample_1d		; GFX9-LABEL: name: sample_1d
; GFX9: bb.1.main_body:		; GFX9: bb.1.main_body:
; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0		; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
		foadUnsubmitted Done Reply Inline Actions Why has the indentation changed in almost every test? foad: Why has the indentation changed in almost every test?
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4		; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5		; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6		; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7		; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8		; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9		; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)		; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
▲ Show 20 Lines • Show All 758 Lines • ▼ Show 20 Lines	define amdgpu_ps <4 x float> @sample_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {
; GFX10: $vgpr2 = COPY [[UV2]](s32)		; GFX10: $vgpr2 = COPY [[UV2]](s32)
; GFX10: $vgpr3 = COPY [[UV3]](s32)		; GFX10: $vgpr3 = COPY [[UV3]](s32)
; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @sample_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s) {		define amdgpu_ps <4 x float> @sample_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s) {
; GFX9-LABEL: name: sample_b_1d		; GFX9-LABEL: name: sample_b_1d
; GFX9: bb.1.main_body:		; GFX9: bb.1.main_body:
; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1		; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4		; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5		; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6		; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7		; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8		; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9		; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)		; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)
; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF		; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[DEF]](s32)		; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>)		; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[DEF]](s32)
		; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.1d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.1d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX9: $vgpr0 = COPY [[UV]](s32)		; GFX9: $vgpr0 = COPY [[UV]](s32)
; GFX9: $vgpr1 = COPY [[UV1]](s32)		; GFX9: $vgpr1 = COPY [[UV1]](s32)
; GFX9: $vgpr2 = COPY [[UV2]](s32)		; GFX9: $vgpr2 = COPY [[UV2]](s32)
; GFX9: $vgpr3 = COPY [[UV3]](s32)		; GFX9: $vgpr3 = COPY [[UV3]](s32)
; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
; GFX10-LABEL: name: sample_b_1d		; GFX10-LABEL: name: sample_b_1d
Show All 10 Lines	define amdgpu_ps <4 x float> @sample_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s) {
; GFX10: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)		; GFX10: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
; GFX10: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX10: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)
; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF		; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[DEF]](s32)		; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX10: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>)		; GFX10: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[DEF]](s32)
		; GFX10: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.1d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.1d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX10: $vgpr0 = COPY [[UV]](s32)		; GFX10: $vgpr0 = COPY [[UV]](s32)
; GFX10: $vgpr1 = COPY [[UV1]](s32)		; GFX10: $vgpr1 = COPY [[UV1]](s32)
; GFX10: $vgpr2 = COPY [[UV2]](s32)		; GFX10: $vgpr2 = COPY [[UV2]](s32)
; GFX10: $vgpr3 = COPY [[UV3]](s32)		; GFX10: $vgpr3 = COPY [[UV3]](s32)
; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f16(i32 15, float %bias, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f16.f16(i32 15, half %bias, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @sample_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {		define amdgpu_ps <4 x float> @sample_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t) {
; GFX9-LABEL: name: sample_b_2d		; GFX9-LABEL: name: sample_b_2d
; GFX9: bb.1.main_body:		; GFX9: bb.1.main_body:
; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2		; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4		; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5		; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6		; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7		; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8		; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9		; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)		; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)		; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)		; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>)		; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
		; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.2d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.2d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX9: $vgpr0 = COPY [[UV]](s32)		; GFX9: $vgpr0 = COPY [[UV]](s32)
; GFX9: $vgpr1 = COPY [[UV1]](s32)		; GFX9: $vgpr1 = COPY [[UV1]](s32)
; GFX9: $vgpr2 = COPY [[UV2]](s32)		; GFX9: $vgpr2 = COPY [[UV2]](s32)
; GFX9: $vgpr3 = COPY [[UV3]](s32)		; GFX9: $vgpr3 = COPY [[UV3]](s32)
; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
; GFX10-LABEL: name: sample_b_2d		; GFX10-LABEL: name: sample_b_2d
Show All 11 Lines	define amdgpu_ps <4 x float> @sample_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t) {
; GFX10: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX10: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)		; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)		; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX10: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>)		; GFX10: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
		; GFX10: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.2d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.2d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX10: $vgpr0 = COPY [[UV]](s32)		; GFX10: $vgpr0 = COPY [[UV]](s32)
; GFX10: $vgpr1 = COPY [[UV1]](s32)		; GFX10: $vgpr1 = COPY [[UV1]](s32)
; GFX10: $vgpr2 = COPY [[UV2]](s32)		; GFX10: $vgpr2 = COPY [[UV2]](s32)
; GFX10: $vgpr3 = COPY [[UV3]](s32)		; GFX10: $vgpr3 = COPY [[UV3]](s32)
; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f16.f16(i32 15, half %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @sample_c_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s) {		define amdgpu_ps <4 x float> @sample_c_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s) {
; GFX9-LABEL: name: sample_c_b_1d		; GFX9-LABEL: name: sample_c_b_1d
; GFX9: bb.1.main_body:		; GFX9: bb.1.main_body:
; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2		; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4		; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5		; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6		; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7		; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8		; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9		; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)		; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)
; GFX9: [[BITCAST1:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF		; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)		; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<6 x s16>) = G_CONCAT_VECTORS [[BITCAST]](<2 x s16>), [[BITCAST1]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>)		; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
		; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)
		; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<6 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.1d), 15, [[CONCAT_VECTORS]](<6 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.1d), 15, [[CONCAT_VECTORS]](<6 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX9: $vgpr0 = COPY [[UV]](s32)		; GFX9: $vgpr0 = COPY [[UV]](s32)
; GFX9: $vgpr1 = COPY [[UV1]](s32)		; GFX9: $vgpr1 = COPY [[UV1]](s32)
; GFX9: $vgpr2 = COPY [[UV2]](s32)		; GFX9: $vgpr2 = COPY [[UV2]](s32)
; GFX9: $vgpr3 = COPY [[UV3]](s32)		; GFX9: $vgpr3 = COPY [[UV3]](s32)
; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
; GFX10-LABEL: name: sample_c_b_1d		; GFX10-LABEL: name: sample_c_b_1d
Show All 11 Lines	define amdgpu_ps <4 x float> @sample_c_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s) {
; GFX10: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX10: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)
; GFX10: [[BITCAST1:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF		; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)		; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.1d), 15, [[BITCAST]](<2 x s16>), [[BITCAST1]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
		; GFX10: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[DEF]](s32)
		; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.1d), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX10: $vgpr0 = COPY [[UV]](s32)		; GFX10: $vgpr0 = COPY [[UV]](s32)
; GFX10: $vgpr1 = COPY [[UV1]](s32)		; GFX10: $vgpr1 = COPY [[UV1]](s32)
; GFX10: $vgpr2 = COPY [[UV2]](s32)		; GFX10: $vgpr2 = COPY [[UV2]](s32)
; GFX10: $vgpr3 = COPY [[UV3]](s32)		; GFX10: $vgpr3 = COPY [[UV3]](s32)
; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f16.f16(i32 15, half %bias, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @sample_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {		define amdgpu_ps <4 x float> @sample_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t) {
; GFX9-LABEL: name: sample_c_b_2d		; GFX9-LABEL: name: sample_c_b_2d
; GFX9: bb.1.main_body:		; GFX9: bb.1.main_body:
; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2, $vgpr3		; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2, $vgpr3
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4		; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5		; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6		; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7		; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8		; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9		; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)		; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3		; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3
; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)		; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX9: [[BITCAST1:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)		; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)		; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<6 x s16>) = G_CONCAT_VECTORS [[BITCAST]](<2 x s16>), [[BITCAST1]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>)		; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
		; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<6 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.2d), 15, [[CONCAT_VECTORS]](<6 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.2d), 15, [[CONCAT_VECTORS]](<6 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX9: $vgpr0 = COPY [[UV]](s32)		; GFX9: $vgpr0 = COPY [[UV]](s32)
; GFX9: $vgpr1 = COPY [[UV1]](s32)		; GFX9: $vgpr1 = COPY [[UV1]](s32)
; GFX9: $vgpr2 = COPY [[UV2]](s32)		; GFX9: $vgpr2 = COPY [[UV2]](s32)
; GFX9: $vgpr3 = COPY [[UV3]](s32)		; GFX9: $vgpr3 = COPY [[UV3]](s32)
; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
; GFX10-LABEL: name: sample_c_b_2d		; GFX10-LABEL: name: sample_c_b_2d
Show All 12 Lines	define amdgpu_ps <4 x float> @sample_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t) {
; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX10: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3		; GFX10: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3
; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)		; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX10: [[BITCAST1:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)		; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)		; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.2d), 15, [[BITCAST]](<2 x s16>), [[BITCAST1]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX10: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
		; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.2d), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX10: $vgpr0 = COPY [[UV]](s32)		; GFX10: $vgpr0 = COPY [[UV]](s32)
; GFX10: $vgpr1 = COPY [[UV1]](s32)		; GFX10: $vgpr1 = COPY [[UV1]](s32)
; GFX10: $vgpr2 = COPY [[UV2]](s32)		; GFX10: $vgpr2 = COPY [[UV2]](s32)
; GFX10: $vgpr3 = COPY [[UV3]](s32)		; GFX10: $vgpr3 = COPY [[UV3]](s32)
; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f16.f16(i32 15, half %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @sample_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %clamp) {		define amdgpu_ps <4 x float> @sample_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %clamp) {
; GFX9-LABEL: name: sample_b_cl_1d		; GFX9-LABEL: name: sample_b_cl_1d
; GFX9: bb.1.main_body:		; GFX9: bb.1.main_body:
; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2		; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4		; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5		; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6		; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7		; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8		; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9		; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)		; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)		; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)		; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>)		; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
		; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.cl.1d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.cl.1d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX9: $vgpr0 = COPY [[UV]](s32)		; GFX9: $vgpr0 = COPY [[UV]](s32)
; GFX9: $vgpr1 = COPY [[UV1]](s32)		; GFX9: $vgpr1 = COPY [[UV1]](s32)
; GFX9: $vgpr2 = COPY [[UV2]](s32)		; GFX9: $vgpr2 = COPY [[UV2]](s32)
; GFX9: $vgpr3 = COPY [[UV3]](s32)		; GFX9: $vgpr3 = COPY [[UV3]](s32)
; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
; GFX10-LABEL: name: sample_b_cl_1d		; GFX10-LABEL: name: sample_b_cl_1d
Show All 11 Lines	define amdgpu_ps <4 x float> @sample_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %clamp) {
; GFX10: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX10: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)		; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)		; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX10: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>)		; GFX10: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
		; GFX10: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.cl.1d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.cl.1d), 15, [[CONCAT_VECTORS]](<4 x s16>), $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX10: $vgpr0 = COPY [[UV]](s32)		; GFX10: $vgpr0 = COPY [[UV]](s32)
; GFX10: $vgpr1 = COPY [[UV1]](s32)		; GFX10: $vgpr1 = COPY [[UV1]](s32)
; GFX10: $vgpr2 = COPY [[UV2]](s32)		; GFX10: $vgpr2 = COPY [[UV2]](s32)
; GFX10: $vgpr3 = COPY [[UV3]](s32)		; GFX10: $vgpr3 = COPY [[UV3]](s32)
; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f16.f16(i32 15, half %bias, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @sample_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {		define amdgpu_ps <4 x float> @sample_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t, half %clamp) {
; GFX9-LABEL: name: sample_b_cl_2d		; GFX9-LABEL: name: sample_b_cl_2d
; GFX9: bb.1.main_body:		; GFX9: bb.1.main_body:
; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2, $vgpr3		; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2, $vgpr3
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4		; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5		; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6		; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7		; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8		; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9		; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)		; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3		; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3
; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)
; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF		; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY15]](s32), [[DEF]](s32)		; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<6 x s16>) = G_CONCAT_VECTORS [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)		; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
		; GFX9: [[BUILD_VECTOR_TRUNC2:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY15]](s32), [[DEF]](s32)
		; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<6 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), [[BUILD_VECTOR_TRUNC2]](<2 x s16>)
; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.cl.2d), 15, [[CONCAT_VECTORS]](<6 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.cl.2d), 15, [[CONCAT_VECTORS]](<6 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX9: $vgpr0 = COPY [[UV]](s32)		; GFX9: $vgpr0 = COPY [[UV]](s32)
; GFX9: $vgpr1 = COPY [[UV1]](s32)		; GFX9: $vgpr1 = COPY [[UV1]](s32)
; GFX9: $vgpr2 = COPY [[UV2]](s32)		; GFX9: $vgpr2 = COPY [[UV2]](s32)
; GFX9: $vgpr3 = COPY [[UV3]](s32)		; GFX9: $vgpr3 = COPY [[UV3]](s32)
; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
; GFX10-LABEL: name: sample_b_cl_2d		; GFX10-LABEL: name: sample_b_cl_2d
Show All 12 Lines	define amdgpu_ps <4 x float> @sample_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t, half %clamp) {
; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX10: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3		; GFX10: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3
; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)
; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF		; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX10: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY15]](s32), [[DEF]](s32)		; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.cl.2d), 15, [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX10: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY13]](s32), [[COPY14]](s32)
		; GFX10: [[BUILD_VECTOR_TRUNC2:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY15]](s32), [[DEF]](s32)
		; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.b.cl.2d), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), [[BUILD_VECTOR_TRUNC2]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX10: $vgpr0 = COPY [[UV]](s32)		; GFX10: $vgpr0 = COPY [[UV]](s32)
; GFX10: $vgpr1 = COPY [[UV1]](s32)		; GFX10: $vgpr1 = COPY [[UV1]](s32)
; GFX10: $vgpr2 = COPY [[UV2]](s32)		; GFX10: $vgpr2 = COPY [[UV2]](s32)
; GFX10: $vgpr3 = COPY [[UV3]](s32)		; GFX10: $vgpr3 = COPY [[UV3]](s32)
; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f16.f16(i32 15, half %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @sample_c_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %clamp) {		define amdgpu_ps <4 x float> @sample_c_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %clamp) {
; GFX9-LABEL: name: sample_c_b_cl_1d		; GFX9-LABEL: name: sample_c_b_cl_1d
; GFX9: bb.1.main_body:		; GFX9: bb.1.main_body:
; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2, $vgpr3		; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2, $vgpr3
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4		; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5		; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6		; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7		; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8		; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9		; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)		; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3		; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3
; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)		; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX9: [[BITCAST1:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)		; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)		; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<6 x s16>) = G_CONCAT_VECTORS [[BITCAST]](<2 x s16>), [[BITCAST1]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>)		; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
		; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<6 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)
; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.cl.1d), 15, [[CONCAT_VECTORS]](<6 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.cl.1d), 15, [[CONCAT_VECTORS]](<6 x s16>), $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX9: $vgpr0 = COPY [[UV]](s32)		; GFX9: $vgpr0 = COPY [[UV]](s32)
; GFX9: $vgpr1 = COPY [[UV1]](s32)		; GFX9: $vgpr1 = COPY [[UV1]](s32)
; GFX9: $vgpr2 = COPY [[UV2]](s32)		; GFX9: $vgpr2 = COPY [[UV2]](s32)
; GFX9: $vgpr3 = COPY [[UV3]](s32)		; GFX9: $vgpr3 = COPY [[UV3]](s32)
; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
; GFX10-LABEL: name: sample_c_b_cl_1d		; GFX10-LABEL: name: sample_c_b_cl_1d
Show All 12 Lines	define amdgpu_ps <4 x float> @sample_c_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %clamp) {
; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX10: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX10: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3		; GFX10: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3
; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)		; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX10: [[BITCAST1:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)		; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)		; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.cl.1d), 15, [[BITCAST]](<2 x s16>), [[BITCAST1]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX10: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
		; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.cl.1d), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX10: $vgpr0 = COPY [[UV]](s32)		; GFX10: $vgpr0 = COPY [[UV]](s32)
; GFX10: $vgpr1 = COPY [[UV1]](s32)		; GFX10: $vgpr1 = COPY [[UV1]](s32)
; GFX10: $vgpr2 = COPY [[UV2]](s32)		; GFX10: $vgpr2 = COPY [[UV2]](s32)
; GFX10: $vgpr3 = COPY [[UV3]](s32)		; GFX10: $vgpr3 = COPY [[UV3]](s32)
; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f16.f16(i32 15, half %bias, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @sample_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {		define amdgpu_ps <4 x float> @sample_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t, half %clamp) {
; GFX9-LABEL: name: sample_c_b_cl_2d		; GFX9-LABEL: name: sample_c_b_cl_2d
; GFX9: bb.1.main_body:		; GFX9: bb.1.main_body:
; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4		; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $sgpr3
; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4		; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $sgpr4
; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5		; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr5
; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6		; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY $sgpr6
; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7		; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY $sgpr7
; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8		; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY $sgpr8
; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9		; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY $sgpr9
; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)		; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<8 x s32>) = G_BUILD_VECTOR [[COPY]](s32), [[COPY1]](s32), [[COPY2]](s32), [[COPY3]](s32), [[COPY4]](s32), [[COPY5]](s32), [[COPY6]](s32), [[COPY7]](s32)
; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10		; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY $sgpr10
; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11		; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY $sgpr11
; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX9: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3		; GFX9: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3
; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY $vgpr4		; GFX9: [[COPY16:%[0-9]+]]:_(s32) = COPY $vgpr4
; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)
; GFX9: [[BITCAST1:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF		; GFX9: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)		; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<8 x s16>) = G_CONCAT_VECTORS [[BITCAST]](<2 x s16>), [[BITCAST1]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>)		; GFX9: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
		; GFX9: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
		; GFX9: [[BUILD_VECTOR_TRUNC2:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
		; GFX9: [[CONCAT_VECTORS:%[0-9]+]]:_(<8 x s16>) = G_CONCAT_VECTORS [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), [[BUILD_VECTOR_TRUNC2]](<2 x s16>)
; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.cl.2d), 15, [[CONCAT_VECTORS]](<8 x s16>), $noreg, $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX9: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.cl.2d), 15, [[CONCAT_VECTORS]](<8 x s16>), $noreg, $noreg, $noreg, $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX9: $vgpr0 = COPY [[UV]](s32)		; GFX9: $vgpr0 = COPY [[UV]](s32)
; GFX9: $vgpr1 = COPY [[UV1]](s32)		; GFX9: $vgpr1 = COPY [[UV1]](s32)
; GFX9: $vgpr2 = COPY [[UV2]](s32)		; GFX9: $vgpr2 = COPY [[UV2]](s32)
; GFX9: $vgpr3 = COPY [[UV3]](s32)		; GFX9: $vgpr3 = COPY [[UV3]](s32)
; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX9: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
; GFX10-LABEL: name: sample_c_b_cl_2d		; GFX10-LABEL: name: sample_c_b_cl_2d
Show All 13 Lines	define amdgpu_ps <4 x float> @sample_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t, half %clamp) {
; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12		; GFX10: [[COPY10:%[0-9]+]]:_(s32) = COPY $sgpr12
; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13		; GFX10: [[COPY11:%[0-9]+]]:_(s32) = COPY $sgpr13
; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)		; GFX10: [[BUILD_VECTOR1:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32), [[COPY10]](s32), [[COPY11]](s32)
; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX10: [[COPY12:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX10: [[COPY13:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX10: [[COPY14:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX10: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3		; GFX10: [[COPY15:%[0-9]+]]:_(s32) = COPY $vgpr3
; GFX10: [[COPY16:%[0-9]+]]:_(s32) = COPY $vgpr4		; GFX10: [[COPY16:%[0-9]+]]:_(s32) = COPY $vgpr4
; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY12]](s32)
; GFX10: [[BITCAST1:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF		; GFX10: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
; GFX10: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)		; GFX10: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY12]](s32), [[DEF]](s32)
; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.cl.2d), 15, [[BITCAST]](<2 x s16>), [[BITCAST1]](<2 x s16>), [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")		; GFX10: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[COPY13]](s32)
		; GFX10: [[BUILD_VECTOR_TRUNC1:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY14]](s32), [[COPY15]](s32)
		; GFX10: [[BUILD_VECTOR_TRUNC2:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY16]](s32), [[DEF]](s32)
		; GFX10: [[AMDGPU_INTRIN_IMAGE_LOAD:%[0-9]+]]:_(<4 x s32>) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.c.b.cl.2d), 15, [[BUILD_VECTOR_TRUNC]](<2 x s16>), [[BITCAST]](<2 x s16>), [[BUILD_VECTOR_TRUNC1]](<2 x s16>), [[BUILD_VECTOR_TRUNC2]](<2 x s16>), $noreg, [[BUILD_VECTOR]](<8 x s32>), [[BUILD_VECTOR1]](<4 x s32>), 0, 0, 0, 3 :: (dereferenceable load (<4 x s32>) from custom "ImageResource")
; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)		; GFX10: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[AMDGPU_INTRIN_IMAGE_LOAD]](<4 x s32>)
; GFX10: $vgpr0 = COPY [[UV]](s32)		; GFX10: $vgpr0 = COPY [[UV]](s32)
; GFX10: $vgpr1 = COPY [[UV1]](s32)		; GFX10: $vgpr1 = COPY [[UV1]](s32)
; GFX10: $vgpr2 = COPY [[UV2]](s32)		; GFX10: $vgpr2 = COPY [[UV2]](s32)
; GFX10: $vgpr3 = COPY [[UV3]](s32)		; GFX10: $vgpr3 = COPY [[UV3]](s32)
; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3		; GFX10: SI_RETURN_TO_EPILOG implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f16.f16(i32 15, half %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {		define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {
; GFX9-LABEL: name: sample_d_1d		; GFX9-LABEL: name: sample_d_1d
; GFX9: bb.1.main_body:		; GFX9: bb.1.main_body:
; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2		; GFX9: liveins: $sgpr2, $sgpr3, $sgpr4, $sgpr5, $sgpr6, $sgpr7, $sgpr8, $sgpr9, $sgpr10, $sgpr11, $sgpr12, $sgpr13, $vgpr0, $vgpr1, $vgpr2
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr2
▲ Show 20 Lines • Show All 1,969 Lines • ▼ Show 20 Lines

declare <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f16(i32, float, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f16(i32, float, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1

declare <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f16(i32, float, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f16.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f16.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f16(i32, float, float, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f16.f16(i32, half, float, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f16(i32, float, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f16.f16(i32, half, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f16.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f16(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f16.f16(i32, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f16(i32, float, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f16.f16(i32, half, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f16(i32, float, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f16.f16(i32, half, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1

declare <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f16.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f16.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f16(i32, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f16(i32, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f16(i32, half, half, half, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f16(i32, half, half, half, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f16(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f16(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f16(i32, float, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f16(i32, float, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f16.f16(i32, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f16.f16(i32, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
declare <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f16(i32, half, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f16(i32, half, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
Show All 28 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.a16.dim.ll

Show First 20 Lines • Show All 328 Lines • ▼ Show 20 Lines
; GFX10NSA-NEXT: image_gather4_c_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16		; GFX10NSA-NEXT: image_gather4_c_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
; GFX10NSA-NEXT: s_waitcnt vmcnt(0)		; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
; GFX10NSA-NEXT: ; return to shader part epilog		; GFX10NSA-NEXT: ; return to shader part epilog
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f16(i32 1, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f16(i32 1, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @gather4_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {		define amdgpu_ps <4 x float> @gather4_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t) {
; GFX9-LABEL: gather4_b_2d:		; GFX9-LABEL: gather4_b_2d:
; GFX9: ; %bb.0: ; %main_body		; GFX9: ; %bb.0: ; %main_body
; GFX9-NEXT: s_mov_b64 s[14:15], exec		; GFX9-NEXT: s_mov_b64 s[14:15], exec
; GFX9-NEXT: s_mov_b32 s0, s2		; GFX9-NEXT: s_mov_b32 s0, s2
; GFX9-NEXT: s_wqm_b64 exec, exec		; GFX9-NEXT: s_wqm_b64 exec, exec
		; GFX9-NEXT: s_mov_b32 s2, s4
		; GFX9-NEXT: s_mov_b32 s4, s6
		; GFX9-NEXT: s_mov_b32 s6, s8
		; GFX9-NEXT: s_mov_b32 s8, s10
		; GFX9-NEXT: s_mov_b32 s10, s12
; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff		; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff
		; GFX9-NEXT: s_lshl_b32 s12, s0, 16
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
; GFX9-NEXT: s_mov_b32 s1, s3		; GFX9-NEXT: s_mov_b32 s1, s3
; GFX9-NEXT: s_mov_b32 s2, s4
; GFX9-NEXT: s_mov_b32 s3, s5		; GFX9-NEXT: s_mov_b32 s3, s5
; GFX9-NEXT: s_mov_b32 s4, s6
; GFX9-NEXT: s_mov_b32 s5, s7		; GFX9-NEXT: s_mov_b32 s5, s7
; GFX9-NEXT: s_mov_b32 s6, s8
; GFX9-NEXT: s_mov_b32 s7, s9		; GFX9-NEXT: s_mov_b32 s7, s9
; GFX9-NEXT: s_mov_b32 s8, s10
; GFX9-NEXT: s_mov_b32 s9, s11		; GFX9-NEXT: s_mov_b32 s9, s11
; GFX9-NEXT: s_mov_b32 s10, s12
; GFX9-NEXT: s_mov_b32 s11, s13		; GFX9-NEXT: s_mov_b32 s11, s13
		; GFX9-NEXT: v_and_or_b32 v0, v0, v3, s12
; GFX9-NEXT: v_and_or_b32 v1, v1, v3, v2		; GFX9-NEXT: v_and_or_b32 v1, v1, v3, v2
; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]		; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
; GFX9-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16		; GFX9-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: ; return to shader part epilog		; GFX9-NEXT: ; return to shader part epilog
;		;
; GFX10NSA-LABEL: gather4_b_2d:		; GFX10NSA-LABEL: gather4_b_2d:
; GFX10NSA: ; %bb.0: ; %main_body		; GFX10NSA: ; %bb.0: ; %main_body
; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo		; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
; GFX10NSA-NEXT: s_mov_b32 s0, s2		; GFX10NSA-NEXT: s_mov_b32 s0, s2
; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo		; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
		; GFX10NSA-NEXT: v_mov_b32_e32 v3, 0xffff
; GFX10NSA-NEXT: v_lshlrev_b32_e32 v2, 16, v2		; GFX10NSA-NEXT: v_lshlrev_b32_e32 v2, 16, v2
; GFX10NSA-NEXT: s_mov_b32 s1, s3
; GFX10NSA-NEXT: s_mov_b32 s2, s4		; GFX10NSA-NEXT: s_mov_b32 s2, s4
; GFX10NSA-NEXT: s_mov_b32 s3, s5
; GFX10NSA-NEXT: s_mov_b32 s4, s6		; GFX10NSA-NEXT: s_mov_b32 s4, s6
; GFX10NSA-NEXT: s_mov_b32 s5, s7
; GFX10NSA-NEXT: s_mov_b32 s6, s8		; GFX10NSA-NEXT: s_mov_b32 s6, s8
; GFX10NSA-NEXT: s_mov_b32 s7, s9
; GFX10NSA-NEXT: s_mov_b32 s8, s10		; GFX10NSA-NEXT: s_mov_b32 s8, s10
; GFX10NSA-NEXT: s_mov_b32 s9, s11
; GFX10NSA-NEXT: s_mov_b32 s10, s12		; GFX10NSA-NEXT: s_mov_b32 s10, s12
		; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16
		; GFX10NSA-NEXT: s_mov_b32 s1, s3
		; GFX10NSA-NEXT: s_mov_b32 s3, s5
		; GFX10NSA-NEXT: s_mov_b32 s5, s7
		; GFX10NSA-NEXT: s_mov_b32 s7, s9
		; GFX10NSA-NEXT: s_mov_b32 s9, s11
; GFX10NSA-NEXT: s_mov_b32 s11, s13		; GFX10NSA-NEXT: s_mov_b32 s11, s13
; GFX10NSA-NEXT: v_and_or_b32 v1, 0xffff, v1, v2		; GFX10NSA-NEXT: v_and_or_b32 v0, v0, v3, s12
		; GFX10NSA-NEXT: v_and_or_b32 v1, v1, v3, v2
; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14		; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
; GFX10NSA-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16		; GFX10NSA-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
; GFX10NSA-NEXT: s_waitcnt vmcnt(0)		; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
; GFX10NSA-NEXT: ; return to shader part epilog		; GFX10NSA-NEXT: ; return to shader part epilog
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f16.f16(i32 1, half %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @gather4_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {		define amdgpu_ps <4 x float> @gather4_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t) {
; GFX9-LABEL: gather4_c_b_2d:		; GFX9-LABEL: gather4_c_b_2d:
; GFX9: ; %bb.0: ; %main_body		; GFX9: ; %bb.0: ; %main_body
; GFX9-NEXT: s_mov_b64 s[14:15], exec		; GFX9-NEXT: s_mov_b64 s[14:15], exec
; GFX9-NEXT: s_mov_b32 s0, s2		; GFX9-NEXT: s_mov_b32 s0, s2
; GFX9-NEXT: s_wqm_b64 exec, exec		; GFX9-NEXT: s_wqm_b64 exec, exec
		; GFX9-NEXT: s_mov_b32 s2, s4
		; GFX9-NEXT: s_mov_b32 s4, s6
		; GFX9-NEXT: s_mov_b32 s6, s8
		; GFX9-NEXT: s_mov_b32 s8, s10
		; GFX9-NEXT: s_mov_b32 s10, s12
; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff		; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff
		; GFX9-NEXT: s_lshl_b32 s12, s0, 16
; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3		; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
; GFX9-NEXT: s_mov_b32 s1, s3		; GFX9-NEXT: s_mov_b32 s1, s3
; GFX9-NEXT: s_mov_b32 s2, s4
; GFX9-NEXT: s_mov_b32 s3, s5		; GFX9-NEXT: s_mov_b32 s3, s5
; GFX9-NEXT: s_mov_b32 s4, s6
; GFX9-NEXT: s_mov_b32 s5, s7		; GFX9-NEXT: s_mov_b32 s5, s7
; GFX9-NEXT: s_mov_b32 s6, s8
; GFX9-NEXT: s_mov_b32 s7, s9		; GFX9-NEXT: s_mov_b32 s7, s9
; GFX9-NEXT: s_mov_b32 s8, s10
; GFX9-NEXT: s_mov_b32 s9, s11		; GFX9-NEXT: s_mov_b32 s9, s11
; GFX9-NEXT: s_mov_b32 s10, s12
; GFX9-NEXT: s_mov_b32 s11, s13		; GFX9-NEXT: s_mov_b32 s11, s13
		; GFX9-NEXT: v_and_or_b32 v0, v0, v4, s12
; GFX9-NEXT: v_and_or_b32 v2, v2, v4, v3		; GFX9-NEXT: v_and_or_b32 v2, v2, v4, v3
; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]		; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
; GFX9-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16		; GFX9-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: ; return to shader part epilog		; GFX9-NEXT: ; return to shader part epilog
;		;
; GFX10NSA-LABEL: gather4_c_b_2d:		; GFX10NSA-LABEL: gather4_c_b_2d:
; GFX10NSA: ; %bb.0: ; %main_body		; GFX10NSA: ; %bb.0: ; %main_body
; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo		; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
; GFX10NSA-NEXT: s_mov_b32 s0, s2		; GFX10NSA-NEXT: s_mov_b32 s0, s2
; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo		; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
		; GFX10NSA-NEXT: v_mov_b32_e32 v4, 0xffff
; GFX10NSA-NEXT: v_lshlrev_b32_e32 v3, 16, v3		; GFX10NSA-NEXT: v_lshlrev_b32_e32 v3, 16, v3
; GFX10NSA-NEXT: s_mov_b32 s1, s3
; GFX10NSA-NEXT: s_mov_b32 s2, s4		; GFX10NSA-NEXT: s_mov_b32 s2, s4
; GFX10NSA-NEXT: s_mov_b32 s3, s5
; GFX10NSA-NEXT: s_mov_b32 s4, s6		; GFX10NSA-NEXT: s_mov_b32 s4, s6
; GFX10NSA-NEXT: s_mov_b32 s5, s7
; GFX10NSA-NEXT: s_mov_b32 s6, s8		; GFX10NSA-NEXT: s_mov_b32 s6, s8
; GFX10NSA-NEXT: s_mov_b32 s7, s9
; GFX10NSA-NEXT: s_mov_b32 s8, s10		; GFX10NSA-NEXT: s_mov_b32 s8, s10
; GFX10NSA-NEXT: s_mov_b32 s9, s11
; GFX10NSA-NEXT: s_mov_b32 s10, s12		; GFX10NSA-NEXT: s_mov_b32 s10, s12
		; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16
		; GFX10NSA-NEXT: s_mov_b32 s1, s3
		; GFX10NSA-NEXT: s_mov_b32 s3, s5
		; GFX10NSA-NEXT: s_mov_b32 s5, s7
		; GFX10NSA-NEXT: s_mov_b32 s7, s9
		; GFX10NSA-NEXT: s_mov_b32 s9, s11
; GFX10NSA-NEXT: s_mov_b32 s11, s13		; GFX10NSA-NEXT: s_mov_b32 s11, s13
; GFX10NSA-NEXT: v_and_or_b32 v2, 0xffff, v2, v3		; GFX10NSA-NEXT: v_and_or_b32 v0, v0, v4, s12
		; GFX10NSA-NEXT: v_and_or_b32 v2, v2, v4, v3
; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14		; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
; GFX10NSA-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16		; GFX10NSA-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
; GFX10NSA-NEXT: s_waitcnt vmcnt(0)		; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
; GFX10NSA-NEXT: ; return to shader part epilog		; GFX10NSA-NEXT: ; return to shader part epilog
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f16.f16(i32 1, half %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @gather4_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {		define amdgpu_ps <4 x float> @gather4_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t, half %clamp) {
; GFX9-LABEL: gather4_b_cl_2d:		; GFX9-LABEL: gather4_b_cl_2d:
; GFX9: ; %bb.0: ; %main_body		; GFX9: ; %bb.0: ; %main_body
; GFX9-NEXT: s_mov_b64 s[14:15], exec		; GFX9-NEXT: s_mov_b64 s[14:15], exec
; GFX9-NEXT: s_mov_b32 s0, s2		; GFX9-NEXT: s_mov_b32 s0, s2
; GFX9-NEXT: s_wqm_b64 exec, exec		; GFX9-NEXT: s_wqm_b64 exec, exec
; GFX9-NEXT: s_mov_b32 s2, s4		; GFX9-NEXT: s_mov_b32 s2, s4
; GFX9-NEXT: s_mov_b32 s4, s6		; GFX9-NEXT: s_mov_b32 s4, s6
; GFX9-NEXT: s_mov_b32 s6, s8		; GFX9-NEXT: s_mov_b32 s6, s8
; GFX9-NEXT: s_mov_b32 s8, s10		; GFX9-NEXT: s_mov_b32 s8, s10
; GFX9-NEXT: s_mov_b32 s10, s12		; GFX9-NEXT: s_mov_b32 s10, s12
; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff		; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
; GFX9-NEXT: s_lshl_b32 s12, s0, 16		; GFX9-NEXT: s_lshl_b32 s12, s0, 16
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
; GFX9-NEXT: s_mov_b32 s1, s3		; GFX9-NEXT: s_mov_b32 s1, s3
; GFX9-NEXT: s_mov_b32 s3, s5		; GFX9-NEXT: s_mov_b32 s3, s5
; GFX9-NEXT: s_mov_b32 s5, s7		; GFX9-NEXT: s_mov_b32 s5, s7
; GFX9-NEXT: s_mov_b32 s7, s9		; GFX9-NEXT: s_mov_b32 s7, s9
; GFX9-NEXT: s_mov_b32 s9, s11		; GFX9-NEXT: s_mov_b32 s9, s11
; GFX9-NEXT: s_mov_b32 s11, s13		; GFX9-NEXT: s_mov_b32 s11, s13
		; GFX9-NEXT: v_and_or_b32 v0, v0, v4, s12
; GFX9-NEXT: v_and_or_b32 v1, v1, v4, v2		; GFX9-NEXT: v_and_or_b32 v1, v1, v4, v2
; GFX9-NEXT: v_and_or_b32 v2, v3, v4, s12		; GFX9-NEXT: v_and_or_b32 v2, v3, v4, s12
; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]		; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
; GFX9-NEXT: image_gather4_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16		; GFX9-NEXT: image_gather4_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: ; return to shader part epilog		; GFX9-NEXT: ; return to shader part epilog
;		;
; GFX10NSA-LABEL: gather4_b_cl_2d:		; GFX10NSA-LABEL: gather4_b_cl_2d:
Show All 10 Lines
; GFX10NSA-NEXT: s_mov_b32 s10, s12		; GFX10NSA-NEXT: s_mov_b32 s10, s12
; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16		; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16
; GFX10NSA-NEXT: s_mov_b32 s1, s3		; GFX10NSA-NEXT: s_mov_b32 s1, s3
; GFX10NSA-NEXT: s_mov_b32 s3, s5		; GFX10NSA-NEXT: s_mov_b32 s3, s5
; GFX10NSA-NEXT: s_mov_b32 s5, s7		; GFX10NSA-NEXT: s_mov_b32 s5, s7
; GFX10NSA-NEXT: s_mov_b32 s7, s9		; GFX10NSA-NEXT: s_mov_b32 s7, s9
; GFX10NSA-NEXT: s_mov_b32 s9, s11		; GFX10NSA-NEXT: s_mov_b32 s9, s11
; GFX10NSA-NEXT: s_mov_b32 s11, s13		; GFX10NSA-NEXT: s_mov_b32 s11, s13
		; GFX10NSA-NEXT: v_and_or_b32 v0, v0, v4, s12
; GFX10NSA-NEXT: v_and_or_b32 v1, v1, v4, v2		; GFX10NSA-NEXT: v_and_or_b32 v1, v1, v4, v2
; GFX10NSA-NEXT: v_and_or_b32 v2, v3, v4, s12		; GFX10NSA-NEXT: v_and_or_b32 v2, v3, v4, s12
; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14		; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
; GFX10NSA-NEXT: image_gather4_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16		; GFX10NSA-NEXT: image_gather4_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
; GFX10NSA-NEXT: s_waitcnt vmcnt(0)		; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
; GFX10NSA-NEXT: ; return to shader part epilog		; GFX10NSA-NEXT: ; return to shader part epilog
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f16.f16(i32 1, half %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @gather4_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {		define amdgpu_ps <4 x float> @gather4_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t, half %clamp) {
; GFX9-LABEL: gather4_c_b_cl_2d:		; GFX9-LABEL: gather4_c_b_cl_2d:
; GFX9: ; %bb.0: ; %main_body		; GFX9: ; %bb.0: ; %main_body
; GFX9-NEXT: s_mov_b64 s[14:15], exec		; GFX9-NEXT: s_mov_b64 s[14:15], exec
; GFX9-NEXT: s_mov_b32 s0, s2		; GFX9-NEXT: s_mov_b32 s0, s2
; GFX9-NEXT: s_wqm_b64 exec, exec		; GFX9-NEXT: s_wqm_b64 exec, exec
; GFX9-NEXT: s_mov_b32 s2, s4		; GFX9-NEXT: s_mov_b32 s2, s4
; GFX9-NEXT: s_mov_b32 s4, s6		; GFX9-NEXT: s_mov_b32 s4, s6
; GFX9-NEXT: s_mov_b32 s6, s8		; GFX9-NEXT: s_mov_b32 s6, s8
; GFX9-NEXT: s_mov_b32 s8, s10		; GFX9-NEXT: s_mov_b32 s8, s10
; GFX9-NEXT: s_mov_b32 s10, s12		; GFX9-NEXT: s_mov_b32 s10, s12
; GFX9-NEXT: v_mov_b32_e32 v5, 0xffff		; GFX9-NEXT: v_mov_b32_e32 v5, 0xffff
; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
; GFX9-NEXT: s_lshl_b32 s12, s0, 16		; GFX9-NEXT: s_lshl_b32 s12, s0, 16
		; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
; GFX9-NEXT: s_mov_b32 s1, s3		; GFX9-NEXT: s_mov_b32 s1, s3
; GFX9-NEXT: s_mov_b32 s3, s5		; GFX9-NEXT: s_mov_b32 s3, s5
; GFX9-NEXT: s_mov_b32 s5, s7		; GFX9-NEXT: s_mov_b32 s5, s7
; GFX9-NEXT: s_mov_b32 s7, s9		; GFX9-NEXT: s_mov_b32 s7, s9
; GFX9-NEXT: s_mov_b32 s9, s11		; GFX9-NEXT: s_mov_b32 s9, s11
; GFX9-NEXT: s_mov_b32 s11, s13		; GFX9-NEXT: s_mov_b32 s11, s13
		; GFX9-NEXT: v_and_or_b32 v0, v0, v5, s12
; GFX9-NEXT: v_and_or_b32 v2, v2, v5, v3		; GFX9-NEXT: v_and_or_b32 v2, v2, v5, v3
; GFX9-NEXT: v_and_or_b32 v3, v4, v5, s12		; GFX9-NEXT: v_and_or_b32 v3, v4, v5, s12
; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]		; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 a16		; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 a16
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: ; return to shader part epilog		; GFX9-NEXT: ; return to shader part epilog
;		;
; GFX10NSA-LABEL: gather4_c_b_cl_2d:		; GFX10NSA-LABEL: gather4_c_b_cl_2d:
Show All 10 Lines
; GFX10NSA-NEXT: s_mov_b32 s10, s12		; GFX10NSA-NEXT: s_mov_b32 s10, s12
; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16		; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16
; GFX10NSA-NEXT: s_mov_b32 s1, s3		; GFX10NSA-NEXT: s_mov_b32 s1, s3
; GFX10NSA-NEXT: s_mov_b32 s3, s5		; GFX10NSA-NEXT: s_mov_b32 s3, s5
; GFX10NSA-NEXT: s_mov_b32 s5, s7		; GFX10NSA-NEXT: s_mov_b32 s5, s7
; GFX10NSA-NEXT: s_mov_b32 s7, s9		; GFX10NSA-NEXT: s_mov_b32 s7, s9
; GFX10NSA-NEXT: s_mov_b32 s9, s11		; GFX10NSA-NEXT: s_mov_b32 s9, s11
; GFX10NSA-NEXT: s_mov_b32 s11, s13		; GFX10NSA-NEXT: s_mov_b32 s11, s13
		; GFX10NSA-NEXT: v_and_or_b32 v0, v0, v5, s12
; GFX10NSA-NEXT: v_and_or_b32 v2, v2, v5, v3		; GFX10NSA-NEXT: v_and_or_b32 v2, v2, v5, v3
; GFX10NSA-NEXT: v_and_or_b32 v3, v4, v5, s12		; GFX10NSA-NEXT: v_and_or_b32 v3, v4, v5, s12
; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14		; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
; GFX10NSA-NEXT: image_gather4_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16		; GFX10NSA-NEXT: image_gather4_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
; GFX10NSA-NEXT: s_waitcnt vmcnt(0)		; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
; GFX10NSA-NEXT: ; return to shader part epilog		; GFX10NSA-NEXT: ; return to shader part epilog
main_body:		main_body:
%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f16.f16(i32 1, half %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

define amdgpu_ps <4 x float> @gather4_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {		define amdgpu_ps <4 x float> @gather4_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {
; GFX9-LABEL: gather4_l_2d:		; GFX9-LABEL: gather4_l_2d:
; GFX9: ; %bb.0: ; %main_body		; GFX9: ; %bb.0: ; %main_body
; GFX9-NEXT: s_mov_b32 s0, s2		; GFX9-NEXT: s_mov_b32 s0, s2
; GFX9-NEXT: s_mov_b32 s2, s4		; GFX9-NEXT: s_mov_b32 s2, s4
▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	main_body:
%v = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.2d.v4f32.f16(i32 1, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)		%v = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.2d.v4f32.f16(i32 1, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
ret <4 x float> %v		ret <4 x float> %v
}		}

declare <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32 immarg, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32 immarg, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32 immarg, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32 immarg, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32 immarg, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32 immarg, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f16(i32 immarg, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f16(i32 immarg, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f16(i32 immarg, float, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f16.f16(i32 immarg, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f16(i32 immarg, float, float, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f16.f16(i32 immarg, half, float, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f16(i32 immarg, float, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f16.f16(i32 immarg, half, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f16(i32 immarg, float, float, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f16.f16(i32 immarg, half, float, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.l.2d.v4f32.f16(i32 immarg, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.l.2d.v4f32.f16(i32 immarg, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f16(i32 immarg, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f16(i32 immarg, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f16(i32 immarg, float, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f16(i32 immarg, float, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f16(i32 immarg, float, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f16(i32 immarg, float, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.c.l.2d.v4f32.f16(i32 immarg, float, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.c.l.2d.v4f32.f16(i32 immarg, float, half, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0
declare <4 x float> @llvm.amdgcn.image.gather4.c.lz.2d.v4f32.f16(i32 immarg, float, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0		declare <4 x float> @llvm.amdgcn.image.gather4.c.lz.2d.v4f32.f16(i32 immarg, float, half, half, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #0

attributes #0 = { nounwind readonly }		attributes #0 = { nounwind readonly }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll

	Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: image_gather4_c_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_c_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t) {
	; GFX9-LABEL: gather4_b_2d:			; GFX9-LABEL: gather4_b_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_b_2d:			; GFX10-LABEL: gather4_b_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f16.f16(i32 1, half %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t) {
	; GFX9-LABEL: gather4_c_b_2d:			; GFX9-LABEL: gather4_c_b_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_c_b_2d:			; GFX10-LABEL: gather4_c_b_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f16.f16(i32 1, half %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t, half %clamp) {
	; GFX9-LABEL: gather4_b_cl_2d:			; GFX9-LABEL: gather4_b_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GFX9-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_b_cl_2d:			; GFX10-LABEL: gather4_b_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4_b_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_b_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f16.f16(i32 1, half %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t, half %clamp) {
	; GFX9-LABEL: gather4_c_b_cl_2d:			; GFX9-LABEL: gather4_c_b_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v7, v4			; GFX9-NEXT: v_mov_b32_e32 v7, v4
	; GFX9-NEXT: v_mov_b32_e32 v4, v0			; GFX9-NEXT: v_mov_b32_e32 v4, v0
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v2
	; GFX9-NEXT: v_mov_b32_e32 v5, v1			; GFX9-NEXT: v_mov_b32_e32 v5, v1
	Show All 9 Lines
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v0, v1, v2, v4], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v0, v1, v2, v4], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f16.f16(i32 1, half %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {			define amdgpu_ps <4 x float> @gather4_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {
	; GFX9-LABEL: gather4_l_2d:			; GFX9-LABEL: gather4_l_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	declare <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f32(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f32(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f32(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f32(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f16.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f16(i32, float, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f16.f16(i32, half, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f16(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f16.f16(i32, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f16(i32, float, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f16.f16(i32, half, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.gather4.l.2d.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.l.2d.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.c.l.2d.v4f32.f32(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.l.2d.v4f32.f32(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.c.lz.2d.v4f32.f32(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.lz.2d.v4f32.f32(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll

	Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: image_sample_c_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_c_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s) {			define amdgpu_ps <4 x float> @sample_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s) {
	; GFX9-LABEL: sample_b_1d:			; GFX9-LABEL: sample_b_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_1d:			; GFX10-LABEL: sample_b_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f16(i32 15, float %bias, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f16.f16(i32 15, half %bias, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t) {
	; GFX9-LABEL: sample_b_2d:			; GFX9-LABEL: sample_b_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_2d:			; GFX10-LABEL: sample_b_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f16.f16(i32 15, half %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s) {			define amdgpu_ps <4 x float> @sample_c_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s) {
	; GFX9-LABEL: sample_c_b_1d:			; GFX9-LABEL: sample_c_b_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_b_1d:			; GFX10-LABEL: sample_c_b_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f16.f16(i32 15, half %bias, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t) {
	; GFX9-LABEL: sample_c_b_2d:			; GFX9-LABEL: sample_c_b_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_b_2d:			; GFX10-LABEL: sample_c_b_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f16.f16(i32 15, half %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %clamp) {
	; GFX9-LABEL: sample_b_cl_1d:			; GFX9-LABEL: sample_b_cl_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_b_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_cl_1d:			; GFX10-LABEL: sample_b_cl_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample_b_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f16.f16(i32 15, half %bias, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t, half %clamp) {
	; GFX9-LABEL: sample_b_cl_2d:			; GFX9-LABEL: sample_b_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GFX9-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_cl_2d:			; GFX10-LABEL: sample_b_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_b_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f16.f16(i32 15, half %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %clamp) {
	; GFX9-LABEL: sample_c_b_cl_1d:			; GFX9-LABEL: sample_c_b_cl_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_c_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_b_cl_1d:			; GFX10-LABEL: sample_c_b_cl_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample_c_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f16.f16(i32 15, half %bias, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t, half %clamp) {
	; GFX9-LABEL: sample_c_b_cl_2d:			; GFX9-LABEL: sample_c_b_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v7, v4			; GFX9-NEXT: v_mov_b32_e32 v7, v4
	; GFX9-NEXT: v_mov_b32_e32 v4, v0			; GFX9-NEXT: v_mov_b32_e32 v4, v0
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v2
	; GFX9-NEXT: v_mov_b32_e32 v5, v1			; GFX9-NEXT: v_mov_b32_e32 v5, v1
	Show All 9 Lines
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c_b_cl v[0:3], [v0, v1, v2, v4], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_c_b_cl v[0:3], [v0, v1, v2, v4], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f16.f16(i32 15, half %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {			define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {
	; GFX9-LABEL: sample_d_1d:			; GFX9-LABEL: sample_d_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: image_sample_d v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_d v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	▲ Show 20 Lines • Show All 683 Lines • ▼ Show 20 Lines

	declare <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f16(i32, float, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f16(i32, float, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f16(i32, float, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f16.f16(i32, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f16.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f16(i32, float, float, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f16.f16(i32, half, float, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f16(i32, float, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f16.f16(i32, half, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f16(i32, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f16.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f16(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f16.f16(i32, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f16(i32, float, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f16.f16(i32, half, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f16(i32, float, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f16.f16(i32, half, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f16.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f16.f16(i32, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f16(i32, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f16(i32, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f16(i32, half, half, half, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f16(i32, half, half, half, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f16(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f16(i32, float, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f16(i32, float, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f16(i32, float, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f16.f16(i32, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f16.f16(i32, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f16(i32, half, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f16(i32, half, half, half, half, half, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	Show All 28 Lines