This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsAMDGPU.td
-
lib/
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPU.h
-
AMDGPU.td
-
AMDGPUSubtarget.h
-
AMDGPUSubtarget.cpp
-
AMDGPUTargetMachine.cpp
-
CMakeLists.txt
-
MIMGInstructions.td
-
SIAddIMGInit.cpp
-
SIISelLowering.cpp
-
SIInstrInfo.cpp
-
Utils/
-
AMDGPUBaseInfo.h
-
Transforms/InstCombine/
-
InstCombine/
-
InstCombineInternal.h
-
InstCombineSimplifyDemanded.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
llvm.amdgcn.image.dim.ll
-
llvm.amdgcn.image.load.a16.d16.ll
-
llvm.amdgcn.image.load.a16.ll
-
llvm.amdgcn.image.sample.d16.dim.ll
-
llvm.amdgcn.image.sample.dim.ll
-
Transforms/InstCombine/AMDGPU/
-
InstCombine/
-
AMDGPU/
-
amdgcn-demanded-vector-elts.ll

Differential D48826

[AMDGPU] Add support for TFE/LWE in image intrinsics
ClosedPublic

Authored by dstuttard on Jul 2 2018, 4:32 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
sheredom

Commits

rGf77079f89254: [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try
rGde02e4b1cc68: Add support for TFE/LWE in image intrinsics
rL351054: [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try
rL347871: Add support for TFE/LWE in image intrinsics

Summary

TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 5 parts:

Modify the instruction defs in tablegen to add new instruction variants that

can accomodate the extra return values.

Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE

(where the bulk of the work for these instruction types is now done)

Extra verification code to catch cases where intrinsics have been used but

insufficient return registers are used.

Modification to the adjustWritemask optimisation to account for TFE/LWE being

enabled (requires extra registers to be maintained for error return value).

An extra pass to zero initialize the error value return - this is because if

the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.

Tests have been added/modified to test the new behaviour.

Diff Detail

Repository: rL LLVM

Event Timeline

dstuttard created this revision.Jul 2 2018, 4:32 AM

Herald added subscribers: llvm-commits, t-tye, tpr and 6 others. · View Herald TranscriptJul 2 2018, 4:32 AM

Harbormaster completed remote builds in B19934: Diff 153696.Jul 2 2018, 4:32 AM

dstuttard added a reviewer: nhaehnle.Jul 2 2018, 4:33 AM

@nhaehnle - just added you as reviewer at the moment.

I wasn't sure if the separate pass to do the mimg result initialization would be better done in the lowerImage in SIISelLowering.cpp, but thought I'd submit what I've got to canvas opinion.

tpr added inline comments.Jul 2 2018, 6:30 AM

lib/Target/AMDGPU/AMDGPU.h
146 ↗	(On Diff #153696)	This accidental double semicolon gave me lots of warnings.

I would expect the intrinsics to change for this. You can use a struct return type, which is what I would expect for this. Something like { <4 x float>, i1 }? You also could have a 5 element vector, it would just require more work to deal with during lowering

arsenm added inline comments.Jul 2 2018, 7:13 AM

lib/Target/AMDGPU/SIAddIMGInit.cpp
16 ↗	(On Diff #153696)	This needs to go below includes
27 ↗	(On Diff #153696)	Why do you need to include this?
46 ↗	(On Diff #153696)	You don't need this, the default comes from the INITIALIZE_PASS
83 ↗	(On Diff #153696)	MI.mayStore() works
84–86 ↗	(On Diff #153696)	Capitalize
112 ↗	(On Diff #153696)	New line

tpr added inline comments.Jul 4 2018, 12:58 PM

lib/Target/AMDGPU/AMDGPU.td
380 ↗	(On Diff #153696)	Surely this feature should be called enable-prt-strict-null?

nhaehnle added inline comments.Jul 27 2018, 7:13 AM

lib/Target/AMDGPU/SIISelLowering.cpp
7799 ↗	(On Diff #153696)	This should be a bool.
lib/Target/AMDGPU/SIInstrInfo.cpp
2780 ↗	(On Diff #153696)	Can also use `!MI.mayStore()`
2793–2798 ↗	(On Diff #153696)	It's a minor thing, but I think it would be easier to follow to first divide the RegCount for D16 based on `(D16 && D16->getImm() && !ST.hasUnpackedD16VMem())`, and then increment that for LWE \|\| TFE afterwards.
test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.ll
15–19 ↗	(On Diff #153696)	This doesn't test everything we'd want to test here, which is that in the NOPRT case, you don't have initializations of v0-v3. I'd suggest adding STRICTPRT and NONSTRICTPRT check prefixes, and adding GCN to the new run line as well, so you don't need to duplicate the image_load line, and you can use NONSTRICTPRT-NOT lines.
test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll
21 ↗	(On Diff #153696)	How about checks for initialization here?

Please also fold in the fix to disable InstCombine for image ops with tfe/lfe (AMDVLK fixes c5675e2d, fea7c135, 7b7276ec).

lib/Target/AMDGPU/SIISelLowering.cpp
7798 ↗	(On Diff #153696)	Please use operand names (per AMDVLK fix 458248e9).

Herald added a subscriber: jvesely. · View Herald TranscriptAug 29 2018, 6:32 AM

Folded in most of the changes highlighted in the review

There are also several new changes to the original patch in light of bugs
uncovered during further testing with applications using the support.

I haven't attempted to implement Matt's suggestion of using a struct return type
or 5-vec support. Could we add that as something to do at a later date?
Nicolai, do you have any feedback on that piece of work - my feeling is that it
is quite a complicated change based on how the intrinsics are defined. If not, I
can take another look if you can give some pointers.

Harbormaster completed remote builds in B22489: Diff 164921.Sep 11 2018, 10:31 AM

dstuttard marked 14 inline comments as done.Sep 11 2018, 10:35 AM

dstuttard added inline comments.

lib/Target/AMDGPU/SIInstrInfo.cpp
2793–2798 ↗	(On Diff #153696)	Agreed - not really sure how I arrived at the original non-obvious way. Suspect it evolved.

dstuttard marked 2 inline comments as done.Sep 11 2018, 10:36 AM

Thanks! Some small issues let, and please also add a test case for the "simplifyDemanded" implementation.

As for the question of return types: on second thought,having <8 x half> as a return type when halves 5 & 6 are really combined to a single i32 LWE/TFE return is indeed a bit awkward. It would make sense to have {<4 x half>, i32} instead at the LLVM IR level. The intrinsics definition themselves should relax fairly easily. You need to replace llvm_anyfloat_ty by llvm_any_ty in:

AMDGPUDimSampleProfile
defms for int_amdgcn_image_load{,_mip}

The machine instructions themselves would stay the same. The only risk is the SelectionDAG itself, though I don't see anything that would fail or be particularly complicated to deal with right now. I believe the struct type just gets split into two separate result values of the SDNode.

lib/Target/AMDGPU/SIAddIMGInit.cpp
103 ↗	(On Diff #164921)	Variable names should be capitalized, here and below.
170 ↗	(On Diff #164921)	Space between MF and MI.
lib/Target/AMDGPU/SIISelLowering.cpp
4643 ↗	(On Diff #164921)	The LoadVT.isVector() is redundant.
lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
935–941 ↗	(On Diff #164921)	TFE and LWE enables are combined in a single argument, so this needs to be fixed.
test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.ll
620 ↗	(On Diff #164921)	Oops :)

Changed the implementation of the intrinsic return type to be an aggregate type

Harbormaster completed remote builds in B24125: Diff 170823.Oct 24 2018, 1:00 AM

Covered all the requested changes (I think). Also implemented a test to make sure that simplifyDemanded doesn't run when TFE/LWE is enabled.

Thank you for making these changes. I have some detail comments inline and some high-level remarks:

We need tests for {float, i32}, {<2 x float>, i32}, and {half, i32} return types
We need tests with dmask = 0 from the very beginning in the intrinsic (there's a case in lowerImage which looks like it may be broken), and a test with dmask != 0 in the intrinsic, but none of the data returns are being used (only the i32 return is used).
We need tests where the dmask is materially smaller than the return type, e.g. return type {<4 x float>, i32} and popcount(dmask) < 4, and return type {<2 x half>, i32} and popcount(dmask) <= 2.

I also have a high-level concern about the design of lowerImage, because the adjustRetValue feels a bit like spooky action at a distance, plus concern about the last point above about dmask. Have you considered the following possibility:

Keep ReturnTypes[0] as-is until the code that determines NumVDataDwords.
In the load case, adjust NumVDataDwords first based on popcount(dmask) and then based on whether LWE/TFE is enabled.
Synthesize a new ReturnTypes[0] from scratch based on NumVDataDwords
Then generically extract the data payloads vector from the NewNode, doing cast and adjust-for-unpackedD16 in a common path for both with and without TFE/LWE.

lib/Target/AMDGPU/SIAddIMGInit.cpp
7 ↗	(On Diff #170823)	The more common practice seems to be having a separator line here between the license info and the file description.
20–21 ↗	(On Diff #170823)	I believe you don't need these two includes.
86–87 ↗	(On Diff #170823)	I'd prefer this to be an assertion. It's easy enough to change if we ever do get MIMG instructions without TFE/LWE. In the meantime, assertions allow us to be more explicit about the space of possibilities we actually need to think about.
108 ↗	(On Diff #170823)	Same argument here: I'd prefer this to be an assertion.
117 ↗	(On Diff #170823)	It seems prudent to add a static assertion that `AMDGPU::sub0 == 1 && AMDGPU::sub4 == 5` here.
lib/Target/AMDGPU/SIISelLowering.cpp
4722 ↗	(On Diff #170823)	This can be moved lower.
lib/Transforms/InstCombine/InstCombineInternal.h
803 ↗	(On Diff #170823)	Should be TFCIdx for consistency.

Made minor code changes suggested in review

Harbormaster completed remote builds in B24329: Diff 171699.Oct 30 2018, 7:51 AM

Minor changes made.

I'll look at implementing the extra tests you highlight - I think at least one of those might already be covered, but I'll verify that during the implementation of the new ones.
I think we might also have an issue with non-TFE/LWE variants that have a return type that is larger than the dmask (the last of your bullets) - so fixing this for the TFE/LWE case might also fix that - I'll check.

Regarding the suggested re-factor to remove the spooky effect at a distance code :) - agreed that it is a bit of a muddle as it stands. I think your suggestion is a good one so I'll look at re-factoring as suggested.

lib/Target/AMDGPU/SIAddIMGInit.cpp
86–87 ↗	(On Diff #170823)	Agreed, that makes sense

Thanks for making the changes.

I think we got away with return type larger than dmask previously, because in the worst case we just over-conservatively allocate registers. But now that the number of registers actually matters because it's used to access the TFE/LWE return...

Modified based on review feedback from Nicolai

Not sure that the new code is any less spooky, but I'm sure it is more robust.
New requested tests also implemented.
Rebased onto latest trunk and fixed new issues with a16 support.

Herald added a subscriber: jfb. · View Herald TranscriptNov 7 2018, 6:21 AM

Harbormaster completed remote builds in B24669: Diff 172942.Nov 7 2018, 6:21 AM

Added Neil as a reviewer as I've made some changes to some of his a16 tests. I'm pretty certain that the modifications are correct, but wanted to get feedback on that as well.

I think that we can get rid of adjustWriteMask altogether at some point, but it will require extra support in the instcombine for the image instructions to support TFE/LWE (it just gives up at the moment), but it is definitely wrong to have this code in 2 places. However, we can do that as a separate change - this one is getting quite large already!

Thank you, this looks much cleaner. I only have a small number of nitpicks left.

lib/Target/AMDGPU/SIISelLowering.cpp
885 ↗	(On Diff #172942)	Space after the comma.
4668 ↗	(On Diff #172942)	ResultTypes can be const, right? Use ArrayRef if yes, SmallVectorImpl otherwise.
4711–4713 ↗	(On Diff #172942)	I think you can directly ExtractVectorElements into `BVElts` and get rid of `Elts` entirely.
4980–4984 ↗	(On Diff #172942)	Please put braces around the multi-line if "block" here.

Thanks for the review - made all the suggested changes

lib/Target/AMDGPU/SIISelLowering.cpp
4711–4713 ↗	(On Diff #172942)	Doh.

Harbormaster completed remote builds in B25147: Diff 174613.Nov 19 2018, 7:35 AM

ping

LGTM

This revision is now accepted and ready to land.Nov 29 2018, 4:04 AM

Closed by commit rL347871: Add support for TFE/LWE in image intrinsics (authored by dstuttard). · Explain WhyNov 29 2018, 7:24 AM

This revision was automatically updated to reflect the committed changes.

kosarev mentioned this in D138215: [AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones..Nov 18 2022, 3:39 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

IR/

IntrinsicsAMDGPU.td

6 lines

lib/

Target/

AMDGPU/

4 lines

10 lines

7 lines

6 lines

AMDGPUTargetMachine.cpp

1 line

1 line

10 lines

181 lines

343 lines

36 lines

Utils/

AMDGPUBaseInfo.h

1 line

Transforms/

InstCombine/

InstCombineInternal.h

3 lines

InstCombineSimplifyDemanded.cpp

18 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.image.dim.ll

438 lines

llvm.amdgcn.image.load.a16.d16.ll

12 lines

llvm.amdgcn.image.load.a16.ll

12 lines

llvm.amdgcn.image.sample.d16.dim.ll

53 lines

llvm.amdgcn.image.sample.dim.ll

186 lines

Transforms/

InstCombine/

AMDGPU/

amdgcn-demanded-vector-elts.ll

23 lines

Diff 175869

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

Show First 20 Lines • Show All 584 Lines • ▼ Show 20 Lines	class AMDGPUDimProfileCopy<AMDGPUDimProfile base> : AMDGPUDimProfile<base.OpMod, base.Dim> {
let Gradients = base.Gradients;		let Gradients = base.Gradients;
let LodClampMip = base.LodClampMip;		let LodClampMip = base.LodClampMip;
}		}

class AMDGPUDimSampleProfile<string opmod,		class AMDGPUDimSampleProfile<string opmod,
AMDGPUDimProps dim,		AMDGPUDimProps dim,
AMDGPUSampleVariant sample> : AMDGPUDimProfile<opmod, dim> {		AMDGPUSampleVariant sample> : AMDGPUDimProfile<opmod, dim> {
let IsSample = 1;		let IsSample = 1;
let RetTypes = [llvm_anyfloat_ty];		let RetTypes = [llvm_any_ty];
let ExtraAddrArgs = sample.ExtraAddrArgs;		let ExtraAddrArgs = sample.ExtraAddrArgs;
let Gradients = sample.Gradients;		let Gradients = sample.Gradients;
let LodClampMip = sample.LodOrClamp;		let LodClampMip = sample.LodOrClamp;
}		}

class AMDGPUDimNoSampleProfile<string opmod,		class AMDGPUDimNoSampleProfile<string opmod,
AMDGPUDimProps dim,		AMDGPUDimProps dim,
list<LLVMType> retty,		list<LLVMType> retty,
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	foreach dim = AMDGPUDims.All in {
def !strconcat(NAME, "_", dim.Name)		def !strconcat(NAME, "_", dim.Name)
: AMDGPUImageDimIntrinsic<		: AMDGPUImageDimIntrinsic<
AMDGPUDimNoSampleProfile<opmod, dim, retty, dataargs, Mip>,		AMDGPUDimNoSampleProfile<opmod, dim, retty, dataargs, Mip>,
props, sdnodeprops>;		props, sdnodeprops>;
}		}
}		}

defm int_amdgcn_image_load		defm int_amdgcn_image_load
: AMDGPUImageDimIntrinsicsAll<"LOAD", [llvm_anyfloat_ty], [], [IntrReadMem],		: AMDGPUImageDimIntrinsicsAll<"LOAD", [llvm_any_ty], [], [IntrReadMem],
[SDNPMemOperand]>,		[SDNPMemOperand]>,
AMDGPUImageDMaskIntrinsic;		AMDGPUImageDMaskIntrinsic;
defm int_amdgcn_image_load_mip		defm int_amdgcn_image_load_mip
: AMDGPUImageDimIntrinsicsNoMsaa<"LOAD_MIP", [llvm_anyfloat_ty], [],		: AMDGPUImageDimIntrinsicsNoMsaa<"LOAD_MIP", [llvm_any_ty], [],
[IntrReadMem], [SDNPMemOperand], 1>,		[IntrReadMem], [SDNPMemOperand], 1>,
AMDGPUImageDMaskIntrinsic;		AMDGPUImageDMaskIntrinsic;

defm int_amdgcn_image_store : AMDGPUImageDimIntrinsicsAll<		defm int_amdgcn_image_store : AMDGPUImageDimIntrinsicsAll<
"STORE", [], [AMDGPUArg<llvm_anyfloat_ty, "vdata">],		"STORE", [], [AMDGPUArg<llvm_anyfloat_ty, "vdata">],
[IntrWriteMem], [SDNPMemOperand]>;		[IntrWriteMem], [SDNPMemOperand]>;
defm int_amdgcn_image_store_mip : AMDGPUImageDimIntrinsicsNoMsaa<		defm int_amdgcn_image_store_mip : AMDGPUImageDimIntrinsicsNoMsaa<
"STORE_MIP", [], [AMDGPUArg<llvm_anyfloat_ty, "vdata">],		"STORE_MIP", [], [AMDGPUArg<llvm_anyfloat_ty, "vdata">],
▲ Show 20 Lines • Show All 808 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPU.h

	Show All 36 Lines
	FunctionPass createR600ISelDag(TargetMachine TM, CodeGenOpt::Level OptLevel);			FunctionPass createR600ISelDag(TargetMachine TM, CodeGenOpt::Level OptLevel);

	// SI Passes			// SI Passes
	FunctionPass *createSIAnnotateControlFlowPass();			FunctionPass *createSIAnnotateControlFlowPass();
	FunctionPass *createSIFoldOperandsPass();			FunctionPass *createSIFoldOperandsPass();
	FunctionPass *createSIPeepholeSDWAPass();			FunctionPass *createSIPeepholeSDWAPass();
	FunctionPass *createSILowerI1CopiesPass();			FunctionPass *createSILowerI1CopiesPass();
	FunctionPass *createSIFixupVectorISelPass();			FunctionPass *createSIFixupVectorISelPass();
				FunctionPass *createSIAddIMGInitPass();
	FunctionPass *createSIShrinkInstructionsPass();			FunctionPass *createSIShrinkInstructionsPass();
	FunctionPass *createSILoadStoreOptimizerPass();			FunctionPass *createSILoadStoreOptimizerPass();
	FunctionPass *createSIWholeQuadModePass();			FunctionPass *createSIWholeQuadModePass();
	FunctionPass *createSIFixControlFlowLiveIntervalsPass();			FunctionPass *createSIFixControlFlowLiveIntervalsPass();
	FunctionPass *createSIOptimizeExecMaskingPreRAPass();			FunctionPass *createSIOptimizeExecMaskingPreRAPass();
	FunctionPass *createSIFixSGPRCopiesPass();			FunctionPass *createSIFixSGPRCopiesPass();
	FunctionPass *createSIMemoryLegalizerPass();			FunctionPass *createSIMemoryLegalizerPass();
	FunctionPass *createSIDebuggerInsertNopsPass();			FunctionPass *createSIDebuggerInsertNopsPass();
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	extern char &SIFixWWMLivenessID;			extern char &SIFixWWMLivenessID;

	void initializeAMDGPUSimplifyLibCallsPass(PassRegistry &);			void initializeAMDGPUSimplifyLibCallsPass(PassRegistry &);
	extern char &AMDGPUSimplifyLibCallsID;			extern char &AMDGPUSimplifyLibCallsID;

	void initializeAMDGPUUseNativeCallsPass(PassRegistry &);			void initializeAMDGPUUseNativeCallsPass(PassRegistry &);
	extern char &AMDGPUUseNativeCallsID;			extern char &AMDGPUUseNativeCallsID;

				void initializeSIAddIMGInitPass(PassRegistry &);
				extern char &SIAddIMGInitID;

	void initializeAMDGPUPerfHintAnalysisPass(PassRegistry &);			void initializeAMDGPUPerfHintAnalysisPass(PassRegistry &);
	extern char &AMDGPUPerfHintAnalysisID;			extern char &AMDGPUPerfHintAnalysisID;

	// Passes common to R600 and SI			// Passes common to R600 and SI
	FunctionPass *createAMDGPUPromoteAlloca();			FunctionPass *createAMDGPUPromoteAlloca();
	void initializeAMDGPUPromoteAllocaPass(PassRegistry&);			void initializeAMDGPUPromoteAllocaPass(PassRegistry&);
	extern char &AMDGPUPromoteAllocaID;			extern char &AMDGPUPromoteAllocaID;

	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPU.td

	Show First 20 Lines • Show All 361 Lines • ▼ Show 20 Lines
	>;			>;

	def FeatureEnableDS128 : SubtargetFeature<"enable-ds128",			def FeatureEnableDS128 : SubtargetFeature<"enable-ds128",
	"EnableDS128",			"EnableDS128",
	"true",			"true",
	"Use ds_{read\|write}_b128"			"Use ds_{read\|write}_b128"
	>;			>;

				// Sparse texture support requires that all result registers are zeroed when
				// PRTStrictNull is set to true. This feature is turned on for all architectures
				// but is enabled as a feature in case there are situations where PRTStrictNull
				// is disabled by the driver.
				def FeatureEnablePRTStrictNull : SubtargetFeature<"enable-prt-strict-null",
				"EnablePRTStrictNull",
				"true",
				"Enable zeroing of result registers for sparse texture fetches"
				>;

	// Unless +-flat-for-global is specified, turn on FlatForGlobal for			// Unless +-flat-for-global is specified, turn on FlatForGlobal for
	// all OS-es on VI and newer hardware to avoid assertion failures due			// all OS-es on VI and newer hardware to avoid assertion failures due
	// to missing ADDR64 variants of MUBUF instructions.			// to missing ADDR64 variants of MUBUF instructions.
	// FIXME: moveToVALU should be able to handle converting addr64 MUBUF			// FIXME: moveToVALU should be able to handle converting addr64 MUBUF
	// instructions.			// instructions.

	def FeatureFlatForGlobal : SubtargetFeature<"flat-for-global",			def FeatureFlatForGlobal : SubtargetFeature<"flat-for-global",
	"FlatForGlobal",			"FlatForGlobal",
	▲ Show 20 Lines • Show All 393 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUSubtarget.h

Show First 20 Lines • Show All 320 Lines • ▼ Show 20 Lines	protected:
bool DebuggerEmitPrologue;		bool DebuggerEmitPrologue;

// Used as options.		// Used as options.
bool EnableHugePrivateBuffer;		bool EnableHugePrivateBuffer;
bool EnableLoadStoreOpt;		bool EnableLoadStoreOpt;
bool EnableUnsafeDSOffsetFolding;		bool EnableUnsafeDSOffsetFolding;
bool EnableSIScheduler;		bool EnableSIScheduler;
bool EnableDS128;		bool EnableDS128;
		bool EnablePRTStrictNull;
bool DumpCode;		bool DumpCode;

// Subtarget statically properties set by tablegen		// Subtarget statically properties set by tablegen
bool FP64;		bool FP64;
bool FMA;		bool FMA;
bool MIMG_R128;		bool MIMG_R128;
bool IsGCN;		bool IsGCN;
bool GCN3Encoding;		bool GCN3Encoding;
▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	public:
}		}

/// \returns If MUBUF instructions always perform range checking, even for		/// \returns If MUBUF instructions always perform range checking, even for
/// buffer resources used for private memory access.		/// buffer resources used for private memory access.
bool privateMemoryResourceIsRangeChecked() const {		bool privateMemoryResourceIsRangeChecked() const {
return getGeneration() < AMDGPUSubtarget::GFX9;		return getGeneration() < AMDGPUSubtarget::GFX9;
}		}

		/// \returns If target requires PRT Struct NULL support (zero result registers
		/// for sparse texture support).
		bool usePRTStrictNull() const {
		return EnablePRTStrictNull;
		}

bool hasAutoWaitcntBeforeBarrier() const {		bool hasAutoWaitcntBeforeBarrier() const {
return AutoWaitcntBeforeBarrier;		return AutoWaitcntBeforeBarrier;
}		}

bool hasCodeObjectV3() const {		bool hasCodeObjectV3() const {
// FIXME: Need to add code object v3 support for mesa and pal.		// FIXME: Need to add code object v3 support for mesa and pal.
return isAmdHsaOS() ? CodeObjectV3 : false;		return isAmdHsaOS() ? CodeObjectV3 : false;
}		}
▲ Show 20 Lines • Show All 526 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	GCNSubtarget::initializeSubtargetDependencies(const Triple &TT,
// Determine default and user-specified characteristics		// Determine default and user-specified characteristics
// On SI+, we want FP64 denormals to be on by default. FP32 denormals can be		// On SI+, we want FP64 denormals to be on by default. FP32 denormals can be
// enabled, but some instructions do not respect them and they run at the		// enabled, but some instructions do not respect them and they run at the
// double precision rate, so don't enable by default.		// double precision rate, so don't enable by default.
//		//
// We want to be able to turn these off, but making this a subtarget feature		// We want to be able to turn these off, but making this a subtarget feature
// for SI has the unhelpful behavior that it unsets everything else if you		// for SI has the unhelpful behavior that it unsets everything else if you
// disable it.		// disable it.
		//
		// Similarly we want enable-prt-strict-null to be on by default and not to
		// unset everything else if it is disabled

SmallString<256> FullFS("+promote-alloca,+dx10-clamp,+load-store-opt,");		SmallString<256> FullFS("+promote-alloca,+dx10-clamp,+load-store-opt,");

if (isAmdHsaOS()) // Turn on FlatForGlobal for HSA.		if (isAmdHsaOS()) // Turn on FlatForGlobal for HSA.
FullFS += "+flat-address-space,+flat-for-global,+unaligned-buffer-access,+trap-handler,";		FullFS += "+flat-address-space,+flat-for-global,+unaligned-buffer-access,+trap-handler,";

// FIXME: I don't think think Evergreen has any useful support for		// FIXME: I don't think think Evergreen has any useful support for
// denormals, but should be checked. Should we issue a warning somewhere		// denormals, but should be checked. Should we issue a warning somewhere
// if someone tries to enable these?		// if someone tries to enable these?
if (getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS) {		if (getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS) {
FullFS += "+fp64-fp16-denormals,";		FullFS += "+fp64-fp16-denormals,";
} else {		} else {
FullFS += "-fp32-denormals,";		FullFS += "-fp32-denormals,";
}		}

		FullFS += "+enable-prt-strict-null,"; // This is overridden by a disable in FS

FullFS += FS;		FullFS += FS;

ParseSubtargetFeatures(GPU, FullFS);		ParseSubtargetFeatures(GPU, FullFS);

// We don't support FP64 for EG/NI atm.		// We don't support FP64 for EG/NI atm.
assert(!hasFP64() \|\| (getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS));		assert(!hasFP64() \|\| (getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS));

// Unless +-flat-for-global is specified, turn on FlatForGlobal for all OS-es		// Unless +-flat-for-global is specified, turn on FlatForGlobal for all OS-es
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	GCNSubtarget::GCNSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
DebuggerInsertNops(false),		DebuggerInsertNops(false),
DebuggerEmitPrologue(false),		DebuggerEmitPrologue(false),

EnableHugePrivateBuffer(false),		EnableHugePrivateBuffer(false),
EnableLoadStoreOpt(false),		EnableLoadStoreOpt(false),
EnableUnsafeDSOffsetFolding(false),		EnableUnsafeDSOffsetFolding(false),
EnableSIScheduler(false),		EnableSIScheduler(false),
EnableDS128(false),		EnableDS128(false),
		EnablePRTStrictNull(false),
DumpCode(false),		DumpCode(false),

FP64(false),		FP64(false),
GCN3Encoding(false),		GCN3Encoding(false),
CIInsts(false),		CIInsts(false),
VIInsts(false),		VIInsts(false),
GFX9Insts(false),		GFX9Insts(false),
SGPRInitBug(false),		SGPRInitBug(false),
▲ Show 20 Lines • Show All 504 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 809 Lines • ▼ Show 20 Lines	bool GCNPassConfig::addILPOpts() {
return false;		return false;
}		}

bool GCNPassConfig::addInstSelector() {		bool GCNPassConfig::addInstSelector() {
AMDGPUPassConfig::addInstSelector();		AMDGPUPassConfig::addInstSelector();
addPass(&SIFixSGPRCopiesID);		addPass(&SIFixSGPRCopiesID);
addPass(createSILowerI1CopiesPass());		addPass(createSILowerI1CopiesPass());
addPass(createSIFixupVectorISelPass());		addPass(createSIFixupVectorISelPass());
		addPass(createSIAddIMGInitPass());
return false;		return false;
}		}

bool GCNPassConfig::addIRTranslator() {		bool GCNPassConfig::addIRTranslator() {
addPass(new IRTranslator());		addPass(new IRTranslator());
return false;		return false;
}		}

▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
R600InstrInfo.cpp		R600InstrInfo.cpp
R600ISelLowering.cpp		R600ISelLowering.cpp
R600MachineFunctionInfo.cpp		R600MachineFunctionInfo.cpp
R600MachineScheduler.cpp		R600MachineScheduler.cpp
R600OpenCLImageTypeLoweringPass.cpp		R600OpenCLImageTypeLoweringPass.cpp
R600OptimizeVectorRegisters.cpp		R600OptimizeVectorRegisters.cpp
R600Packetizer.cpp		R600Packetizer.cpp
R600RegisterInfo.cpp		R600RegisterInfo.cpp
		SIAddIMGInit.cpp
SIAnnotateControlFlow.cpp		SIAnnotateControlFlow.cpp
SIDebuggerInsertNops.cpp		SIDebuggerInsertNops.cpp
SIFixSGPRCopies.cpp		SIFixSGPRCopies.cpp
SIFixupVectorISel.cpp		SIFixupVectorISel.cpp
SIFixVGPRCopies.cpp		SIFixVGPRCopies.cpp
SIFixWWMLiveness.cpp		SIFixWWMLiveness.cpp
SIFoldOperands.cpp		SIFoldOperands.cpp
SIFormMemoryClauses.cpp		SIFormMemoryClauses.cpp
Show All 26 Lines

llvm/trunk/lib/Target/AMDGPU/MIMGInstructions.td

Show All 23 Lines
// Represent an ISA-level opcode, independent of the encoding and the		// Represent an ISA-level opcode, independent of the encoding and the
// vdata/vaddr size.		// vdata/vaddr size.
class MIMGBaseOpcode {		class MIMGBaseOpcode {
MIMGBaseOpcode BaseOpcode = !cast<MIMGBaseOpcode>(NAME);		MIMGBaseOpcode BaseOpcode = !cast<MIMGBaseOpcode>(NAME);
bit Store = 0;		bit Store = 0;
bit Atomic = 0;		bit Atomic = 0;
bit AtomicX2 = 0; // (f)cmpswap		bit AtomicX2 = 0; // (f)cmpswap
bit Sampler = 0;		bit Sampler = 0;
		bit Gather4 = 0;
bits<8> NumExtraArgs = 0;		bits<8> NumExtraArgs = 0;
bit Gradients = 0;		bit Gradients = 0;
bit Coordinates = 1;		bit Coordinates = 1;
bit LodOrClampOrMip = 0;		bit LodOrClampOrMip = 0;
bit HasD16 = 0;		bit HasD16 = 0;
}		}

def MIMGBaseOpcode : GenericEnum {		def MIMGBaseOpcode : GenericEnum {
let FilterClass = "MIMGBaseOpcode";		let FilterClass = "MIMGBaseOpcode";
}		}

def MIMGBaseOpcodesTable : GenericTable {		def MIMGBaseOpcodesTable : GenericTable {
let FilterClass = "MIMGBaseOpcode";		let FilterClass = "MIMGBaseOpcode";
let CppTypeName = "MIMGBaseOpcodeInfo";		let CppTypeName = "MIMGBaseOpcodeInfo";
let Fields = ["BaseOpcode", "Store", "Atomic", "AtomicX2", "Sampler",		let Fields = ["BaseOpcode", "Store", "Atomic", "AtomicX2", "Sampler", "Gather4",
"NumExtraArgs", "Gradients", "Coordinates", "LodOrClampOrMip",		"NumExtraArgs", "Gradients", "Coordinates", "LodOrClampOrMip",
"HasD16"];		"HasD16"];
GenericEnum TypeOf_BaseOpcode = MIMGBaseOpcode;		GenericEnum TypeOf_BaseOpcode = MIMGBaseOpcode;

let PrimaryKey = ["BaseOpcode"];		let PrimaryKey = ["BaseOpcode"];
let PrimaryKeyName = "getMIMGBaseOpcodeInfo";		let PrimaryKeyName = "getMIMGBaseOpcodeInfo";
}		}

▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	let BaseOpcode = !cast<MIMGBaseOpcode>(NAME),
let VDataDwords = 1 in		let VDataDwords = 1 in
defm _V1 : MIMG_NoSampler_Src_Helper <op, asm, VGPR_32, 1>;		defm _V1 : MIMG_NoSampler_Src_Helper <op, asm, VGPR_32, 1>;
let VDataDwords = 2 in		let VDataDwords = 2 in
defm _V2 : MIMG_NoSampler_Src_Helper <op, asm, VReg_64, 0>;		defm _V2 : MIMG_NoSampler_Src_Helper <op, asm, VReg_64, 0>;
let VDataDwords = 3 in		let VDataDwords = 3 in
defm _V3 : MIMG_NoSampler_Src_Helper <op, asm, VReg_96, 0>;		defm _V3 : MIMG_NoSampler_Src_Helper <op, asm, VReg_96, 0>;
let VDataDwords = 4 in		let VDataDwords = 4 in
defm _V4 : MIMG_NoSampler_Src_Helper <op, asm, VReg_128, 0>;		defm _V4 : MIMG_NoSampler_Src_Helper <op, asm, VReg_128, 0>;
		let VDataDwords = 8 in
		defm _V8 : MIMG_NoSampler_Src_Helper <op, asm, VReg_256, 0>;
}		}
}		}

class MIMG_Store_Helper <bits<7> op, string asm,		class MIMG_Store_Helper <bits<7> op, string asm,
RegisterClass data_rc,		RegisterClass data_rc,
RegisterClass addr_rc,		RegisterClass addr_rc,
string dns = "">		string dns = "">
: MIMG <(outs), dns>,		: MIMG <(outs), dns>,
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	let BaseOpcode = !cast<MIMGBaseOpcode>(NAME), WQM = wqm,
let VDataDwords = 1 in		let VDataDwords = 1 in
defm _V1 : MIMG_Sampler_Src_Helper<op, asm, sample, VGPR_32, 1>;		defm _V1 : MIMG_Sampler_Src_Helper<op, asm, sample, VGPR_32, 1>;
let VDataDwords = 2 in		let VDataDwords = 2 in
defm _V2 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_64>;		defm _V2 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_64>;
let VDataDwords = 3 in		let VDataDwords = 3 in
defm _V3 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_96>;		defm _V3 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_96>;
let VDataDwords = 4 in		let VDataDwords = 4 in
defm _V4 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_128>;		defm _V4 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_128>;
		let VDataDwords = 8 in
		defm _V8 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_256>;
}		}
}		}

multiclass MIMG_Sampler_WQM <bits<7> op, AMDGPUSampleVariant sample>		multiclass MIMG_Sampler_WQM <bits<7> op, AMDGPUSampleVariant sample>
: MIMG_Sampler<op, sample, 1>;		: MIMG_Sampler<op, sample, 1>;

multiclass MIMG_Gather <bits<7> op, AMDGPUSampleVariant sample, bit wqm = 0,		multiclass MIMG_Gather <bits<7> op, AMDGPUSampleVariant sample, bit wqm = 0,
string asm = "image_gather4"#sample.LowerCaseMod> {		string asm = "image_gather4"#sample.LowerCaseMod> {
def "" : MIMG_Sampler_BaseOpcode<sample> {		def "" : MIMG_Sampler_BaseOpcode<sample> {
let HasD16 = 1;		let HasD16 = 1;
		let Gather4 = 1;
}		}

let BaseOpcode = !cast<MIMGBaseOpcode>(NAME), WQM = wqm,		let BaseOpcode = !cast<MIMGBaseOpcode>(NAME), WQM = wqm,
Gather4 = 1, hasPostISelHook = 0 in {		Gather4 = 1, hasPostISelHook = 0 in {
let VDataDwords = 2 in		let VDataDwords = 2 in
defm _V2 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_64>; /* for packed D16 only */		defm _V2 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_64>; /* for packed D16 only */
let VDataDwords = 4 in		let VDataDwords = 4 in
defm _V4 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_128, 1>;		defm _V4 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_128, 1>;
		let VDataDwords = 8 in
		defm _V8 : MIMG_Sampler_Src_Helper<op, asm, sample, VReg_256>;
}		}
}		}

multiclass MIMG_Gather_WQM <bits<7> op, AMDGPUSampleVariant sample>		multiclass MIMG_Gather_WQM <bits<7> op, AMDGPUSampleVariant sample>
: MIMG_Gather<op, sample, 1>;		: MIMG_Gather<op, sample, 1>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MIMG Instructions		// MIMG Instructions
▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIAddIMGInit.cpp

				//===-- SIAddIMGInit.cpp - Add any required IMG inits ---------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// Any MIMG instructions that use tfe or lwe require an initialization of the
				/// result register that will be written in the case of a memory access failure
				/// The required code is also added to tie this init code to the result of the
				/// img instruction
				///
				//===----------------------------------------------------------------------===//
				//

				#include "AMDGPU.h"
				#include "AMDGPUSubtarget.h"
				#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
				#include "SIInstrInfo.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/IR/Function.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Target/TargetMachine.h"

				#define DEBUG_TYPE "si-img-init"

				using namespace llvm;

				namespace {

				class SIAddIMGInit : public MachineFunctionPass {
				public:
				static char ID;

				public:
				SIAddIMGInit() : MachineFunctionPass(ID) {
				initializeSIAddIMGInitPass(*PassRegistry::getPassRegistry());
				}

				bool runOnMachineFunction(MachineFunction &MF) override;

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				MachineFunctionPass::getAnalysisUsage(AU);
				}
				};

				} // End anonymous namespace.

				INITIALIZE_PASS(SIAddIMGInit, DEBUG_TYPE, "SI Add IMG Init", false, false)

				char SIAddIMGInit::ID = 0;

				char &llvm::SIAddIMGInitID = SIAddIMGInit::ID;

				FunctionPass *llvm::createSIAddIMGInitPass() { return new SIAddIMGInit(); }

				bool SIAddIMGInit::runOnMachineFunction(MachineFunction &MF) {
				MachineRegisterInfo &MRI = MF.getRegInfo();
				const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
				const SIInstrInfo *TII = ST.getInstrInfo();
				const SIRegisterInfo *RI = ST.getRegisterInfo();
				bool Changed = false;

				for (MachineFunction::iterator BI = MF.begin(), BE = MF.end(); BI != BE;
				++BI) {
				MachineBasicBlock &MBB = *BI;
				MachineBasicBlock::iterator I, Next;
				for (I = MBB.begin(); I != MBB.end(); I = Next) {
				Next = std::next(I);
				MachineInstr &MI = *I;

				auto Opcode = MI.getOpcode();
				if (TII->isMIMG(Opcode) && !MI.mayStore()) {
				MachineOperand *TFE = TII->getNamedOperand(MI, AMDGPU::OpName::tfe);
				MachineOperand *LWE = TII->getNamedOperand(MI, AMDGPU::OpName::lwe);
				MachineOperand *D16 = TII->getNamedOperand(MI, AMDGPU::OpName::d16);

				// Check for instructions that don't have tfe or lwe fields
				// There shouldn't be any at this point.
				assert( (TFE && LWE) && "Expected tfe and lwe operands in instruction");

				unsigned TFEVal = TFE->getImm();
				unsigned LWEVal = LWE->getImm();
				unsigned D16Val = D16 ? D16->getImm() : 0;

				if (TFEVal \|\| LWEVal) {
				// At least one of TFE or LWE are non-zero
				// We have to insert a suitable initialization of the result value and
				// tie this to the dest of the image instruction.

				const DebugLoc &DL = MI.getDebugLoc();

				int DstIdx =
				AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::vdata);

				// Calculate which dword we have to initialize to 0.
				MachineOperand *MO_Dmask =
				TII->getNamedOperand(MI, AMDGPU::OpName::dmask);

				// check that dmask operand is found.
				assert(MO_Dmask && "Expected dmask operand in instruction");

				unsigned dmask = MO_Dmask->getImm();
				// Determine the number of active lanes taking into account the
				// Gather4 special case
				unsigned ActiveLanes =
				TII->isGather4(Opcode) ? 4 : countPopulation(dmask);

				// Subreg indices are counted from 1
				// When D16 then we want next whole VGPR after write data.
				static_assert(AMDGPU::sub0 == 1 && AMDGPU::sub4 == 5, "Subreg indices different from expected");

				bool Packed = !ST.hasUnpackedD16VMem();

				unsigned InitIdx =
				D16Val && Packed ? ((ActiveLanes + 1) >> 1) + 1 : ActiveLanes + 1;

				// Abandon attempt if the dst size isn't large enough
				// - this is in fact an error but this is picked up elsewhere and
				// reported correctly.
				uint32_t DstSize =
				RI->getRegSizeInBits(*TII->getOpRegClass(MI, DstIdx)) / 32;
				if (DstSize < InitIdx)
				continue;

				// Create a register for the intialization value.
				unsigned PrevDst =
				MRI.createVirtualRegister(TII->getOpRegClass(MI, DstIdx));
				unsigned NewDst = 0; // Final initialized value will be in here

				// If PRTStrictNull feature is enabled (the default) then initialize
				// all the result registers to 0, otherwise just the error indication
				// register (VGPRn+1)
				unsigned SizeLeft = ST.usePRTStrictNull() ? InitIdx : 1;
				unsigned CurrIdx = ST.usePRTStrictNull() ? 1 : InitIdx;

				if (DstSize == 1) {
				// In this case we can just initialize the result directly
				BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_MOV_B32_e32), PrevDst)
				.addImm(0);
				NewDst = PrevDst;
				} else {
				BuildMI(MBB, MI, DL, TII->get(AMDGPU::IMPLICIT_DEF), PrevDst);
				for (; SizeLeft; SizeLeft--, CurrIdx++) {
				NewDst =
				MRI.createVirtualRegister(TII->getOpRegClass(MI, DstIdx));
				// Initialize dword
				unsigned SubReg =
				MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
				BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_MOV_B32_e32), SubReg)
				.addImm(0);
				// Insert into the super-reg
				BuildMI(MBB, I, DL, TII->get(TargetOpcode::INSERT_SUBREG), NewDst)
				.addReg(PrevDst)
				.addReg(SubReg)
				.addImm(CurrIdx);

				PrevDst = NewDst;
				}
				}

				// Add as an implicit operand
				MachineInstrBuilder(MF, MI).addReg(NewDst, RegState::Implicit);

				// Tie the just added implicit operand to the dst
				MI.tieOperands(DstIdx, MI.getNumOperands() - 1);

				Changed = true;
				}
				}
				}
				}

				return Changed;
				}

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	SITargetLowering::SITargetLowering(const TargetMachine &TM,
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v4f32, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v4f32, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::f16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::f16, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v2i16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v2i16, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v2f16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v2f16, Custom);

setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::v2f16, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::v2f16, Custom);
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::v4f16, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::v4f16, Custom);
		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::v8f16, Custom);
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);

setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::v2i16, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::v2i16, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::v2f16, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::v2f16, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::v4f16, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::v4f16, Custom);

setOperationAction(ISD::BRCOND, MVT::Other, Custom);		setOperationAction(ISD::BRCOND, MVT::Other, Custom);
▲ Show 20 Lines • Show All 581 Lines • ▼ Show 20 Lines	if (Size == 16 && Subtarget->has16BitInsts()) {
return NumIntermediates;		return NumIntermediates;
}		}
}		}

return TargetLowering::getVectorTypeBreakdownForCallingConv(		return TargetLowering::getVectorTypeBreakdownForCallingConv(
Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT);		Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT);
}		}

		static MVT memVTFromAggregate(Type *Ty) {
		// Only limited forms of aggregate type currently expected.
		assert(Ty->isStructTy() && "Expected struct type");


		Type *ElementType = nullptr;
		unsigned NumElts;
		if (Ty->getContainedType(0)->isVectorTy()) {
		VectorType *VecComponent = cast<VectorType>(Ty->getContainedType(0));
		ElementType = VecComponent->getElementType();
		NumElts = VecComponent->getNumElements();
		} else {
		ElementType = Ty->getContainedType(0);
		NumElts = 1;
		}

		Type *FlagComponent = Ty->getContainedType(1);
		assert(FlagComponent->isIntegerTy(32) && "Expected int32 type");

		// Calculate the size of the memVT type from the aggregate
		unsigned Pow2Elts = 0;
		unsigned ElementSize;
		switch (ElementType->getTypeID()) {
		default:
		llvm_unreachable("Unknown type!");
		case Type::IntegerTyID:
		ElementSize = cast<IntegerType>(ElementType)->getBitWidth();
		break;
		case Type::HalfTyID:
		ElementSize = 16;
		break;
		case Type::FloatTyID:
		ElementSize = 32;
		break;
		}
		unsigned AdditionalElts = ElementSize == 16 ? 2 : 1;
		Pow2Elts = 1 << Log2_32_Ceil(NumElts + AdditionalElts);

		return MVT::getVectorVT(MVT::getVT(ElementType, false),
		Pow2Elts);
		}

bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,		bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
const CallInst &CI,		const CallInst &CI,
MachineFunction &MF,		MachineFunction &MF,
unsigned IntrID) const {		unsigned IntrID) const {
if (const AMDGPU::RsrcIntrinsic *RsrcIntr =		if (const AMDGPU::RsrcIntrinsic *RsrcIntr =
AMDGPU::lookupRsrcIntrinsic(IntrID)) {		AMDGPU::lookupRsrcIntrinsic(IntrID)) {
AttributeList Attr = Intrinsic::getAttributes(CI.getContext(),		AttributeList Attr = Intrinsic::getAttributes(CI.getContext(),
(Intrinsic::ID)IntrID);		(Intrinsic::ID)IntrID);
Show All 11 Lines	if (RsrcIntr->IsImage) {
Info.ptrVal = MFI->getBufferPSV(		Info.ptrVal = MFI->getBufferPSV(
*MF.getSubtarget<GCNSubtarget>().getInstrInfo(),		*MF.getSubtarget<GCNSubtarget>().getInstrInfo(),
CI.getArgOperand(RsrcIntr->RsrcArg));		CI.getArgOperand(RsrcIntr->RsrcArg));
}		}

Info.flags = MachineMemOperand::MODereferenceable;		Info.flags = MachineMemOperand::MODereferenceable;
if (Attr.hasFnAttribute(Attribute::ReadOnly)) {		if (Attr.hasFnAttribute(Attribute::ReadOnly)) {
Info.opc = ISD::INTRINSIC_W_CHAIN;		Info.opc = ISD::INTRINSIC_W_CHAIN;
Info.memVT = MVT::getVT(CI.getType());		Info.memVT = MVT::getVT(CI.getType(), true);
		if (Info.memVT == MVT::Other) {
		// Some intrinsics return an aggregate type - special case to work out
		// the correct memVT
		Info.memVT = memVTFromAggregate(CI.getType());
		}
Info.flags \|= MachineMemOperand::MOLoad;		Info.flags \|= MachineMemOperand::MOLoad;
} else if (Attr.hasFnAttribute(Attribute::WriteOnly)) {		} else if (Attr.hasFnAttribute(Attribute::WriteOnly)) {
Info.opc = ISD::INTRINSIC_VOID;		Info.opc = ISD::INTRINSIC_VOID;
Info.memVT = MVT::getVT(CI.getArgOperand(0)->getType());		Info.memVT = MVT::getVT(CI.getArgOperand(0)->getType());
Info.flags \|= MachineMemOperand::MOStore;		Info.flags \|= MachineMemOperand::MOStore;
} else {		} else {
// Atomic		// Atomic
Info.opc = ISD::INTRINSIC_W_CHAIN;		Info.opc = ISD::INTRINSIC_W_CHAIN;
▲ Show 20 Lines • Show All 3,756 Lines • ▼ Show 20 Lines	static bool parseCachePolicy(SDValue CachePolicy, SelectionDAG &DAG,
if (SLC) {		if (SLC) {
*SLC = DAG.getTargetConstant((Value & 0x2) ? 1 : 0, DL, MVT::i32);		*SLC = DAG.getTargetConstant((Value & 0x2) ? 1 : 0, DL, MVT::i32);
Value &= ~(uint64_t)0x2;		Value &= ~(uint64_t)0x2;
}		}

return Value == 0;		return Value == 0;
}		}

		// Re-construct the required return value for a image load intrinsic.
		// This is more complicated due to the optional use TexFailCtrl which means the required
		// return type is an aggregate
		static SDValue constructRetValue(SelectionDAG &DAG,
		MachineSDNode *Result,
		ArrayRef<EVT> ResultTypes,
		bool IsTexFail, bool Unpacked, bool IsD16,
		int DMaskPop, int NumVDataDwords,
		const SDLoc &DL, LLVMContext &Context) {
		// Determine the required return type. This is the same regardless of IsTexFail flag
		EVT ReqRetVT = ResultTypes[0];
		EVT ReqRetEltVT = ReqRetVT.isVector() ? ReqRetVT.getVectorElementType() : ReqRetVT;
		int ReqRetNumElts = ReqRetVT.isVector() ? ReqRetVT.getVectorNumElements() : 1;
		EVT AdjEltVT = Unpacked && IsD16 ? MVT::i32 : ReqRetEltVT;
		EVT AdjVT = Unpacked ? ReqRetNumElts > 1 ? EVT::getVectorVT(Context, AdjEltVT, ReqRetNumElts)
		: AdjEltVT
		: ReqRetVT;

		// Extract data part of the result
		// Bitcast the result to the same type as the required return type
		int NumElts;
		if (IsD16 && !Unpacked)
		NumElts = NumVDataDwords << 1;
		else
		NumElts = NumVDataDwords;

		EVT CastVT = NumElts > 1 ? EVT::getVectorVT(Context, AdjEltVT, NumElts)
		: AdjEltVT;

		// Special case for v8f16. Rather than add support for this, use v4i32 to
		// extract the data elements
		bool V8F16Special = false;
		if (CastVT == MVT::v8f16) {
		CastVT = MVT::v4i32;
		DMaskPop >>= 1;
		ReqRetNumElts >>= 1;
		V8F16Special = true;
		AdjVT = MVT::v2i32;
		}

		SDValue N = SDValue(Result, 0);
		SDValue CastRes = DAG.getNode(ISD::BITCAST, DL, CastVT, N);

		// Iterate over the result
		SmallVector<SDValue, 4> BVElts;

		if (CastVT.isVector()) {
		DAG.ExtractVectorElements(CastRes, BVElts, 0, DMaskPop);
		} else {
		BVElts.push_back(CastRes);
		}
		int ExtraElts = ReqRetNumElts - DMaskPop;
		while(ExtraElts--)
		BVElts.push_back(DAG.getUNDEF(AdjEltVT));

		SDValue PreTFCRes;
		if (ReqRetNumElts > 1) {
		SDValue NewVec = DAG.getBuildVector(AdjVT, DL, BVElts);
		if (IsD16 && Unpacked)
		PreTFCRes = adjustLoadValueTypeImpl(NewVec, ReqRetVT, DL, DAG, Unpacked);
		else
		PreTFCRes = NewVec;
		} else {
		PreTFCRes = BVElts[0];
		}

		if (V8F16Special)
		PreTFCRes = DAG.getNode(ISD::BITCAST, DL, MVT::v4f16, PreTFCRes);

		if (!IsTexFail) {
		if (Result->getNumValues() > 1)
		return DAG.getMergeValues({PreTFCRes, SDValue(Result, 1)}, DL);
		else
		return PreTFCRes;
		}

		// Extract the TexFail result and insert into aggregate return
		SmallVector<SDValue, 1> TFCElt;
		DAG.ExtractVectorElements(N, TFCElt, DMaskPop, 1);
		SDValue TFCRes = DAG.getNode(ISD::BITCAST, DL, ResultTypes[1], TFCElt[0]);
		return DAG.getMergeValues({PreTFCRes, TFCRes, SDValue(Result, 1)}, DL);
		}

		static bool parseTexFail(SDValue TexFailCtrl, SelectionDAG &DAG, SDValue *TFE,
		SDValue *LWE, bool &IsTexFail) {
		auto TexFailCtrlConst = dyn_cast<ConstantSDNode>(TexFailCtrl.getNode());
		if (!TexFailCtrlConst)
		return false;

		uint64_t Value = TexFailCtrlConst->getZExtValue();
		if (Value) {
		IsTexFail = true;
		}

		SDLoc DL(TexFailCtrlConst);
		*TFE = DAG.getTargetConstant((Value & 0x1) ? 1 : 0, DL, MVT::i32);
		Value &= ~(uint64_t)0x1;
		*LWE = DAG.getTargetConstant((Value & 0x2) ? 1 : 0, DL, MVT::i32);
		Value &= ~(uint64_t)0x2;

		return Value == 0;
		}

SDValue SITargetLowering::lowerImage(SDValue Op,		SDValue SITargetLowering::lowerImage(SDValue Op,
const AMDGPU::ImageDimIntrinsicInfo *Intr,		const AMDGPU::ImageDimIntrinsicInfo *Intr,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
const GCNSubtarget* ST = &MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget* ST = &MF.getSubtarget<GCNSubtarget>();
const AMDGPU::MIMGBaseOpcodeInfo *BaseOpcode =		const AMDGPU::MIMGBaseOpcodeInfo *BaseOpcode =
AMDGPU::getMIMGBaseOpcodeInfo(Intr->BaseOpcode);		AMDGPU::getMIMGBaseOpcodeInfo(Intr->BaseOpcode);
const AMDGPU::MIMGDimInfo *DimInfo = AMDGPU::getMIMGDimInfo(Intr->Dim);		const AMDGPU::MIMGDimInfo *DimInfo = AMDGPU::getMIMGDimInfo(Intr->Dim);
const AMDGPU::MIMGLZMappingInfo *LZMappingInfo =		const AMDGPU::MIMGLZMappingInfo *LZMappingInfo =
AMDGPU::getMIMGLZMappingInfo(Intr->BaseOpcode);		AMDGPU::getMIMGLZMappingInfo(Intr->BaseOpcode);
unsigned IntrOpcode = Intr->BaseOpcode;		unsigned IntrOpcode = Intr->BaseOpcode;

SmallVector<EVT, 2> ResultTypes(Op->value_begin(), Op->value_end());		SmallVector<EVT, 3> ResultTypes(Op->value_begin(), Op->value_end());
		SmallVector<EVT, 3> OrigResultTypes(Op->value_begin(), Op->value_end());
bool IsD16 = false;		bool IsD16 = false;
bool IsA16 = false;		bool IsA16 = false;
SDValue VData;		SDValue VData;
int NumVDataDwords;		int NumVDataDwords;
		bool AdjustRetType = false;

unsigned AddrIdx; // Index of first address argument		unsigned AddrIdx; // Index of first address argument
unsigned DMask;		unsigned DMask;
		unsigned DMaskLanes = 0;

if (BaseOpcode->Atomic) {		if (BaseOpcode->Atomic) {
VData = Op.getOperand(2);		VData = Op.getOperand(2);

bool Is64Bit = VData.getValueType() == MVT::i64;		bool Is64Bit = VData.getValueType() == MVT::i64;
if (BaseOpcode->AtomicX2) {		if (BaseOpcode->AtomicX2) {
SDValue VData2 = Op.getOperand(3);		SDValue VData2 = Op.getOperand(3);
VData = DAG.getBuildVector(Is64Bit ? MVT::v2i64 : MVT::v2i32, DL,		VData = DAG.getBuildVector(Is64Bit ? MVT::v2i64 : MVT::v2i32, DL,
{VData, VData2});		{VData, VData2});
if (Is64Bit)		if (Is64Bit)
VData = DAG.getBitcast(MVT::v4i32, VData);		VData = DAG.getBitcast(MVT::v4i32, VData);

ResultTypes[0] = Is64Bit ? MVT::v2i64 : MVT::v2i32;		ResultTypes[0] = Is64Bit ? MVT::v2i64 : MVT::v2i32;
DMask = Is64Bit ? 0xf : 0x3;		DMask = Is64Bit ? 0xf : 0x3;
NumVDataDwords = Is64Bit ? 4 : 2;		NumVDataDwords = Is64Bit ? 4 : 2;
AddrIdx = 4;		AddrIdx = 4;
} else {		} else {
DMask = Is64Bit ? 0x3 : 0x1;		DMask = Is64Bit ? 0x3 : 0x1;
NumVDataDwords = Is64Bit ? 2 : 1;		NumVDataDwords = Is64Bit ? 2 : 1;
AddrIdx = 3;		AddrIdx = 3;
}		}
} else {		} else {
unsigned DMaskIdx;		unsigned DMaskIdx = BaseOpcode->Store ? 3 : isa<MemSDNode>(Op) ? 2 : 1;
		auto DMaskConst = dyn_cast<ConstantSDNode>(Op.getOperand(DMaskIdx));
		if (!DMaskConst)
		return Op;
		DMask = DMaskConst->getZExtValue();
		DMaskLanes = BaseOpcode->Gather4 ? 4 : countPopulation(DMask);

if (BaseOpcode->Store) {		if (BaseOpcode->Store) {
VData = Op.getOperand(2);		VData = Op.getOperand(2);

MVT StoreVT = VData.getSimpleValueType();		MVT StoreVT = VData.getSimpleValueType();
if (StoreVT.getScalarType() == MVT::f16) {		if (StoreVT.getScalarType() == MVT::f16) {
if (Subtarget->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS \|\|		if (Subtarget->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS \|\|
!BaseOpcode->HasD16)		!BaseOpcode->HasD16)
return Op; // D16 is unsupported for this instruction		return Op; // D16 is unsupported for this instruction

IsD16 = true;		IsD16 = true;
VData = handleD16VData(VData, DAG);		VData = handleD16VData(VData, DAG);
}		}

NumVDataDwords = (VData.getValueType().getSizeInBits() + 31) / 32;		NumVDataDwords = (VData.getValueType().getSizeInBits() + 31) / 32;
DMaskIdx = 3;
} else {		} else {
MVT LoadVT = Op.getSimpleValueType();		// Work out the num dwords based on the dmask popcount and underlying type
		// and whether packing is supported.
		MVT LoadVT = ResultTypes[0].getSimpleVT();
if (LoadVT.getScalarType() == MVT::f16) {		if (LoadVT.getScalarType() == MVT::f16) {
if (Subtarget->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS \|\|		if (Subtarget->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS \|\|
!BaseOpcode->HasD16)		!BaseOpcode->HasD16)
return Op; // D16 is unsupported for this instruction		return Op; // D16 is unsupported for this instruction

IsD16 = true;		IsD16 = true;
if (LoadVT.isVector() && Subtarget->hasUnpackedD16VMem())
ResultTypes[0] = (LoadVT == MVT::v2f16) ? MVT::v2i32 : MVT::v4i32;
}		}

NumVDataDwords = (ResultTypes[0].getSizeInBits() + 31) / 32;		// Confirm that the return type is large enough for the dmask specified
DMaskIdx = isa<MemSDNode>(Op) ? 2 : 1;		if ((LoadVT.isVector() && LoadVT.getVectorNumElements() < DMaskLanes) \|\|
}		(!LoadVT.isVector() && DMaskLanes > 1))

auto DMaskConst = dyn_cast<ConstantSDNode>(Op.getOperand(DMaskIdx));
if (!DMaskConst)
return Op;		return Op;

AddrIdx = DMaskIdx + 1;		if (IsD16 && !Subtarget->hasUnpackedD16VMem())
DMask = DMaskConst->getZExtValue();		NumVDataDwords = (DMaskLanes + 1) / 2;
if (!DMask && !BaseOpcode->Store) {		else
// Eliminate no-op loads. Stores with dmask == 0 are not no-op: they		NumVDataDwords = DMaskLanes;
// store the channels' default values.
SDValue Undef = DAG.getUNDEF(Op.getValueType());		AdjustRetType = true;
if (isa<MemSDNode>(Op))
return DAG.getMergeValues({Undef, Op.getOperand(0)}, DL);
return Undef;
}		}

		AddrIdx = DMaskIdx + 1;
}		}

unsigned NumGradients = BaseOpcode->Gradients ? DimInfo->NumGradients : 0;		unsigned NumGradients = BaseOpcode->Gradients ? DimInfo->NumGradients : 0;
unsigned NumCoords = BaseOpcode->Coordinates ? DimInfo->NumCoords : 0;		unsigned NumCoords = BaseOpcode->Coordinates ? DimInfo->NumCoords : 0;
unsigned NumLCM = BaseOpcode->LodOrClampOrMip ? 1 : 0;		unsigned NumLCM = BaseOpcode->LodOrClampOrMip ? 1 : 0;
unsigned NumVAddrs = BaseOpcode->NumExtraArgs + NumGradients +		unsigned NumVAddrs = BaseOpcode->NumExtraArgs + NumGradients +
NumCoords + NumLCM;		NumCoords + NumLCM;
unsigned NumMIVAddrs = NumVAddrs;		unsigned NumMIVAddrs = NumVAddrs;
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	auto UnormConst =
dyn_cast<ConstantSDNode>(Op.getOperand(AddrIdx + NumVAddrs + 2));		dyn_cast<ConstantSDNode>(Op.getOperand(AddrIdx + NumVAddrs + 2));
if (!UnormConst)		if (!UnormConst)
return Op;		return Op;

Unorm = UnormConst->getZExtValue() ? True : False;		Unorm = UnormConst->getZExtValue() ? True : False;
CtrlIdx = AddrIdx + NumVAddrs + 3;		CtrlIdx = AddrIdx + NumVAddrs + 3;
}		}

		SDValue TFE;
		SDValue LWE;
SDValue TexFail = Op.getOperand(CtrlIdx);		SDValue TexFail = Op.getOperand(CtrlIdx);
auto TexFailConst = dyn_cast<ConstantSDNode>(TexFail.getNode());		bool IsTexFail = false;
if (!TexFailConst \|\| TexFailConst->getZExtValue() != 0)		if (!parseTexFail(TexFail, DAG, &TFE, &LWE, IsTexFail))
return Op;		return Op;

		if (IsTexFail) {
		if (!NumVDataDwords) {
		// Expecting to get an error flag since TFC is on - and dmask is 0
		// Force dmask to be at least 1 otherwise the instruction will fail
		DMask = 0x1;
		DMaskLanes = 1;
		NumVDataDwords = 1;
		}
		NumVDataDwords += 1;
		AdjustRetType = true;
		}

		// Has something earlier tagged that the return type needs adjusting
		// This happens if the instruction is a load or has set TexFailCtrl flags
		if (AdjustRetType) {
		// NumVDataDwords reflects the true number of dwords required in the return type
		if (NumVDataDwords == 0 && !BaseOpcode->Store) {
		// This is a no-op load. This can be eliminated
		SDValue Undef = DAG.getUNDEF(Op.getValueType());
		if (isa<MemSDNode>(Op))
		return DAG.getMergeValues({Undef, Op.getOperand(0)}, DL);
		return Undef;
		}

		// Have to use a power of 2 number of dwords
		NumVDataDwords = 1 << Log2_32_Ceil(NumVDataDwords);

		EVT NewVT = NumVDataDwords > 1 ?
		EVT::getVectorVT(*DAG.getContext(), MVT::f32, NumVDataDwords)
		: MVT::f32;

		ResultTypes[0] = NewVT;
		if (ResultTypes.size() == 3) {
		// Original result was aggregate type used for TexFailCtrl results
		// The actual instruction returns as a vector type which has now been
		// created. Remove the aggregate result.
		ResultTypes.erase(&ResultTypes[1]);
		}
		}

SDValue GLC;		SDValue GLC;
SDValue SLC;		SDValue SLC;
if (BaseOpcode->Atomic) {		if (BaseOpcode->Atomic) {
GLC = True; // TODO no-return optimization		GLC = True; // TODO no-return optimization
if (!parseCachePolicy(Op.getOperand(CtrlIdx + 1), DAG, nullptr, &SLC))		if (!parseCachePolicy(Op.getOperand(CtrlIdx + 1), DAG, nullptr, &SLC))
return Op;		return Op;
} else {		} else {
if (!parseCachePolicy(Op.getOperand(CtrlIdx + 1), DAG, &GLC, &SLC))		if (!parseCachePolicy(Op.getOperand(CtrlIdx + 1), DAG, &GLC, &SLC))
return Op;		return Op;
}		}

SmallVector<SDValue, 14> Ops;		SmallVector<SDValue, 14> Ops;
if (BaseOpcode->Store \|\| BaseOpcode->Atomic)		if (BaseOpcode->Store \|\| BaseOpcode->Atomic)
Ops.push_back(VData); // vdata		Ops.push_back(VData); // vdata
Ops.push_back(VAddr);		Ops.push_back(VAddr);
Ops.push_back(Op.getOperand(AddrIdx + NumVAddrs)); // rsrc		Ops.push_back(Op.getOperand(AddrIdx + NumVAddrs)); // rsrc
if (BaseOpcode->Sampler)		if (BaseOpcode->Sampler)
Ops.push_back(Op.getOperand(AddrIdx + NumVAddrs + 1)); // sampler		Ops.push_back(Op.getOperand(AddrIdx + NumVAddrs + 1)); // sampler
Ops.push_back(DAG.getTargetConstant(DMask, DL, MVT::i32));		Ops.push_back(DAG.getTargetConstant(DMask, DL, MVT::i32));
Ops.push_back(Unorm);		Ops.push_back(Unorm);
Ops.push_back(GLC);		Ops.push_back(GLC);
Ops.push_back(SLC);		Ops.push_back(SLC);
Ops.push_back(IsA16 && // a16 or r128		Ops.push_back(IsA16 && // a16 or r128
ST->hasFeature(AMDGPU::FeatureR128A16) ? True : False);		ST->hasFeature(AMDGPU::FeatureR128A16) ? True : False);
Ops.push_back(False); // tfe		Ops.push_back(TFE); // tfe
Ops.push_back(False); // lwe		Ops.push_back(LWE); // lwe
Ops.push_back(DimInfo->DA ? True : False);		Ops.push_back(DimInfo->DA ? True : False);
if (BaseOpcode->HasD16)		if (BaseOpcode->HasD16)
Ops.push_back(IsD16 ? True : False);		Ops.push_back(IsD16 ? True : False);
if (isa<MemSDNode>(Op))		if (isa<MemSDNode>(Op))
Ops.push_back(Op.getOperand(0)); // chain		Ops.push_back(Op.getOperand(0)); // chain

int NumVAddrDwords = VAddr.getValueType().getSizeInBits() / 32;		int NumVAddrDwords = VAddr.getValueType().getSizeInBits() / 32;
int Opcode = -1;		int Opcode = -1;
Show All 11 Lines	if (auto MemOp = dyn_cast<MemSDNode>(Op)) {
MachineMemOperand *MemRef = MemOp->getMemOperand();		MachineMemOperand *MemRef = MemOp->getMemOperand();
DAG.setNodeMemRefs(NewNode, {MemRef});		DAG.setNodeMemRefs(NewNode, {MemRef});
}		}

if (BaseOpcode->AtomicX2) {		if (BaseOpcode->AtomicX2) {
SmallVector<SDValue, 1> Elt;		SmallVector<SDValue, 1> Elt;
DAG.ExtractVectorElements(SDValue(NewNode, 0), Elt, 0, 1);		DAG.ExtractVectorElements(SDValue(NewNode, 0), Elt, 0, 1);
return DAG.getMergeValues({Elt[0], SDValue(NewNode, 1)}, DL);		return DAG.getMergeValues({Elt[0], SDValue(NewNode, 1)}, DL);
} else if (IsD16 && !BaseOpcode->Store) {		} else if (!BaseOpcode->Store) {
MVT LoadVT = Op.getSimpleValueType();		return constructRetValue(DAG, NewNode,
SDValue Adjusted = adjustLoadValueTypeImpl(		OrigResultTypes, IsTexFail,
SDValue(NewNode, 0), LoadVT, DL, DAG, Subtarget->hasUnpackedD16VMem());		Subtarget->hasUnpackedD16VMem(), IsD16,
return DAG.getMergeValues({Adjusted, SDValue(NewNode, 1)}, DL);		DMaskLanes, NumVDataDwords, DL,
		*DAG.getContext());
}		}

return SDValue(NewNode, 0);		return SDValue(NewNode, 0);
}		}

SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,		SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
▲ Show 20 Lines • Show All 3,913 Lines • ▼ Show 20 Lines
/// Helper function for adjustWritemask		/// Helper function for adjustWritemask
static unsigned SubIdx2Lane(unsigned Idx) {		static unsigned SubIdx2Lane(unsigned Idx) {
switch (Idx) {		switch (Idx) {
default: return 0;		default: return 0;
case AMDGPU::sub0: return 0;		case AMDGPU::sub0: return 0;
case AMDGPU::sub1: return 1;		case AMDGPU::sub1: return 1;
case AMDGPU::sub2: return 2;		case AMDGPU::sub2: return 2;
case AMDGPU::sub3: return 3;		case AMDGPU::sub3: return 3;
		case AMDGPU::sub4: return 4; // Possible with TFE/LWE
}		}
}		}

/// Adjust the writemask of MIMG instructions		/// Adjust the writemask of MIMG instructions
SDNode SITargetLowering::adjustWritemask(MachineSDNode &Node,		SDNode SITargetLowering::adjustWritemask(MachineSDNode &Node,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
unsigned Opcode = Node->getMachineOpcode();		unsigned Opcode = Node->getMachineOpcode();

// Subtract 1 because the vdata output is not a MachineSDNode operand.		// Subtract 1 because the vdata output is not a MachineSDNode operand.
int D16Idx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::d16) - 1;		int D16Idx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::d16) - 1;
if (D16Idx >= 0 && Node->getConstantOperandVal(D16Idx))		if (D16Idx >= 0 && Node->getConstantOperandVal(D16Idx))
return Node; // not implemented for D16		return Node; // not implemented for D16

SDNode *Users[4] = { nullptr };		SDNode *Users[5] = { nullptr };
unsigned Lane = 0;		unsigned Lane = 0;
unsigned DmaskIdx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::dmask) - 1;		unsigned DmaskIdx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::dmask) - 1;
unsigned OldDmask = Node->getConstantOperandVal(DmaskIdx);		unsigned OldDmask = Node->getConstantOperandVal(DmaskIdx);
unsigned NewDmask = 0;		unsigned NewDmask = 0;
		unsigned TFEIdx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::tfe) - 1;
		unsigned LWEIdx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::lwe) - 1;
		bool UsesTFC = (Node->getConstantOperandVal(TFEIdx) \|\|
		Node->getConstantOperandVal(LWEIdx)) ? 1 : 0;
		unsigned TFCLane = 0;
bool HasChain = Node->getNumValues() > 1;		bool HasChain = Node->getNumValues() > 1;

if (OldDmask == 0) {		if (OldDmask == 0) {
// These are folded out, but on the chance it happens don't assert.		// These are folded out, but on the chance it happens don't assert.
return Node;		return Node;
}		}

		unsigned OldBitsSet = countPopulation(OldDmask);
		// Work out which is the TFE/LWE lane if that is enabled.
		if (UsesTFC) {
		TFCLane = OldBitsSet;
		}

// Try to figure out the used register components		// Try to figure out the used register components
for (SDNode::use_iterator I = Node->use_begin(), E = Node->use_end();		for (SDNode::use_iterator I = Node->use_begin(), E = Node->use_end();
I != E; ++I) {		I != E; ++I) {

// Don't look at users of the chain.		// Don't look at users of the chain.
if (I.getUse().getResNo() != 0)		if (I.getUse().getResNo() != 0)
continue;		continue;

// Abort if we can't understand the usage		// Abort if we can't understand the usage
if (!I->isMachineOpcode() \|\|		if (!I->isMachineOpcode() \|\|
I->getMachineOpcode() != TargetOpcode::EXTRACT_SUBREG)		I->getMachineOpcode() != TargetOpcode::EXTRACT_SUBREG)
return Node;		return Node;

// Lane means which subreg of %vgpra_vgprb_vgprc_vgprd is used.		// Lane means which subreg of %vgpra_vgprb_vgprc_vgprd is used.
// Note that subregs are packed, i.e. Lane==0 is the first bit set		// Note that subregs are packed, i.e. Lane==0 is the first bit set
// in OldDmask, so it can be any of X,Y,Z,W; Lane==1 is the second bit		// in OldDmask, so it can be any of X,Y,Z,W; Lane==1 is the second bit
// set, etc.		// set, etc.
Lane = SubIdx2Lane(I->getConstantOperandVal(1));		Lane = SubIdx2Lane(I->getConstantOperandVal(1));

		// Check if the use is for the TFE/LWE generated result at VGPRn+1.
		if (UsesTFC && Lane == TFCLane) {
		Users[Lane] = *I;
		} else {
// Set which texture component corresponds to the lane.		// Set which texture component corresponds to the lane.
unsigned Comp;		unsigned Comp;
for (unsigned i = 0, Dmask = OldDmask; (i <= Lane) && (Dmask != 0); i++) {		for (unsigned i = 0, Dmask = OldDmask; (i <= Lane) && (Dmask != 0); i++) {
Comp = countTrailingZeros(Dmask);		Comp = countTrailingZeros(Dmask);
Dmask &= ~(1 << Comp);		Dmask &= ~(1 << Comp);
}		}

// Abort if we have more than one user per component		// Abort if we have more than one user per component.
if (Users[Lane])		if (Users[Lane])
return Node;		return Node;

Users[Lane] = *I;		Users[Lane] = *I;
NewDmask \|= 1 << Comp;		NewDmask \|= 1 << Comp;
}		}
		}

		// Don't allow 0 dmask, as hardware assumes one channel enabled.
		bool NoChannels = !NewDmask;
		if (NoChannels) {
		// If the original dmask has one channel - then nothing to do
		if (OldBitsSet == 1)
		return Node;
		// Use an arbitrary dmask - required for the instruction to work
		NewDmask = 1;
		}
// Abort if there's no change		// Abort if there's no change
if (NewDmask == OldDmask)		if (NewDmask == OldDmask)
return Node;		return Node;

unsigned BitsSet = countPopulation(NewDmask);		unsigned BitsSet = countPopulation(NewDmask);

int NewOpcode = AMDGPU::getMaskedMIMGOp(Node->getMachineOpcode(), BitsSet);		// Check for TFE or LWE - increase the number of channels by one to account
		// for the extra return value
		// This will need adjustment for D16 if this is also included in
		// adjustWriteMask (this function) but at present D16 are excluded.
		unsigned NewChannels = BitsSet + UsesTFC;

		int NewOpcode =
		AMDGPU::getMaskedMIMGOp(Node->getMachineOpcode(), NewChannels);
assert(NewOpcode != -1 &&		assert(NewOpcode != -1 &&
NewOpcode != static_cast<int>(Node->getMachineOpcode()) &&		NewOpcode != static_cast<int>(Node->getMachineOpcode()) &&
"failed to find equivalent MIMG op");		"failed to find equivalent MIMG op");

// Adjust the writemask in the node		// Adjust the writemask in the node
SmallVector<SDValue, 12> Ops;		SmallVector<SDValue, 12> Ops;
Ops.insert(Ops.end(), Node->op_begin(), Node->op_begin() + DmaskIdx);		Ops.insert(Ops.end(), Node->op_begin(), Node->op_begin() + DmaskIdx);
Ops.push_back(DAG.getTargetConstant(NewDmask, SDLoc(Node), MVT::i32));		Ops.push_back(DAG.getTargetConstant(NewDmask, SDLoc(Node), MVT::i32));
Ops.insert(Ops.end(), Node->op_begin() + DmaskIdx + 1, Node->op_end());		Ops.insert(Ops.end(), Node->op_begin() + DmaskIdx + 1, Node->op_end());

MVT SVT = Node->getValueType(0).getVectorElementType().getSimpleVT();		MVT SVT = Node->getValueType(0).getVectorElementType().getSimpleVT();

MVT ResultVT = BitsSet == 1 ?		MVT ResultVT = NewChannels == 1 ?
SVT : MVT::getVectorVT(SVT, BitsSet == 3 ? 4 : BitsSet);		SVT : MVT::getVectorVT(SVT, NewChannels == 3 ? 4 :
		NewChannels == 5 ? 8 : NewChannels);
SDVTList NewVTList = HasChain ?		SDVTList NewVTList = HasChain ?
DAG.getVTList(ResultVT, MVT::Other) : DAG.getVTList(ResultVT);		DAG.getVTList(ResultVT, MVT::Other) : DAG.getVTList(ResultVT);


MachineSDNode *NewNode = DAG.getMachineNode(NewOpcode, SDLoc(Node),		MachineSDNode *NewNode = DAG.getMachineNode(NewOpcode, SDLoc(Node),
NewVTList, Ops);		NewVTList, Ops);

if (HasChain) {		if (HasChain) {
// Update chain.		// Update chain.
DAG.setNodeMemRefs(NewNode, Node->memoperands());		DAG.setNodeMemRefs(NewNode, Node->memoperands());
DAG.ReplaceAllUsesOfValueWith(SDValue(Node, 1), SDValue(NewNode, 1));		DAG.ReplaceAllUsesOfValueWith(SDValue(Node, 1), SDValue(NewNode, 1));
}		}

if (BitsSet == 1) {		if (NewChannels == 1) {
assert(Node->hasNUsesOfValue(1, 0));		assert(Node->hasNUsesOfValue(1, 0));
SDNode *Copy = DAG.getMachineNode(TargetOpcode::COPY,		SDNode *Copy = DAG.getMachineNode(TargetOpcode::COPY,
SDLoc(Node), Users[Lane]->getValueType(0),		SDLoc(Node), Users[Lane]->getValueType(0),
SDValue(NewNode, 0));		SDValue(NewNode, 0));
DAG.ReplaceAllUsesWith(Users[Lane], Copy);		DAG.ReplaceAllUsesWith(Users[Lane], Copy);
return nullptr;		return nullptr;
}		}

// Update the users of the node with the new indices		// Update the users of the node with the new indices
for (unsigned i = 0, Idx = AMDGPU::sub0; i < 4; ++i) {		for (unsigned i = 0, Idx = AMDGPU::sub0; i < 5; ++i) {
SDNode *User = Users[i];		SDNode *User = Users[i];
if (!User)		if (!User) {
		// Handle the special case of NoChannels. We set NewDmask to 1 above, but
		// Users[0] is still nullptr because channel 0 doesn't really have a use.
		if (i \|\| !NoChannels)
continue;		continue;
		} else {
SDValue Op = DAG.getTargetConstant(Idx, SDLoc(User), MVT::i32);		SDValue Op = DAG.getTargetConstant(Idx, SDLoc(User), MVT::i32);
DAG.UpdateNodeOperands(User, SDValue(NewNode, 0), Op);		DAG.UpdateNodeOperands(User, SDValue(NewNode, 0), Op);
		}

switch (Idx) {		switch (Idx) {
default: break;		default: break;
case AMDGPU::sub0: Idx = AMDGPU::sub1; break;		case AMDGPU::sub0: Idx = AMDGPU::sub1; break;
case AMDGPU::sub1: Idx = AMDGPU::sub2; break;		case AMDGPU::sub1: Idx = AMDGPU::sub2; break;
case AMDGPU::sub2: Idx = AMDGPU::sub3; break;		case AMDGPU::sub2: Idx = AMDGPU::sub3; break;
		case AMDGPU::sub3: Idx = AMDGPU::sub4; break;
}		}
}		}

DAG.RemoveDeadNode(Node);		DAG.RemoveDeadNode(Node);
return nullptr;		return nullptr;
}		}

static bool isFrameIndexOp(SDValue Op) {		static bool isFrameIndexOp(SDValue Op) {
▲ Show 20 Lines • Show All 499 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 2,962 Lines • ▼ Show 20 Lines	if (DstUnused && DstUnused->isImm() &&
} else if (TargetRegisterInfo::isPhysicalRegister(TiedMO.getReg()) &&		} else if (TargetRegisterInfo::isPhysicalRegister(TiedMO.getReg()) &&
Dst.getReg() != TiedMO.getReg()) {		Dst.getReg() != TiedMO.getReg()) {
ErrInfo = "Dst register should use same physical register as preserved";		ErrInfo = "Dst register should use same physical register as preserved";
return false;		return false;
}		}
}		}
}		}

		// Verify MIMG
		if (isMIMG(MI.getOpcode()) && !MI.mayStore()) {
		// Ensure that the return type used is large enough for all the options
		// being used TFE/LWE require an extra result register.
		const MachineOperand *DMask = getNamedOperand(MI, AMDGPU::OpName::dmask);
		if (DMask) {
		uint64_t DMaskImm = DMask->getImm();
		uint32_t RegCount =
		isGather4(MI.getOpcode()) ? 4 : countPopulation(DMaskImm);
		const MachineOperand *TFE = getNamedOperand(MI, AMDGPU::OpName::tfe);
		const MachineOperand *LWE = getNamedOperand(MI, AMDGPU::OpName::lwe);
		const MachineOperand *D16 = getNamedOperand(MI, AMDGPU::OpName::d16);

		// Adjust for packed 16 bit values
		if (D16 && D16->getImm() && !ST.hasUnpackedD16VMem())
		RegCount >>= 1;

		// Adjust if using LWE or TFE
		if ((LWE && LWE->getImm()) \|\| (TFE && TFE->getImm()))
		RegCount += 1;

		const uint32_t DstIdx =
		AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::vdata);
		const MachineOperand &Dst = MI.getOperand(DstIdx);
		if (Dst.isReg()) {
		const TargetRegisterClass *DstRC = getOpRegClass(MI, DstIdx);
		uint32_t DstSize = RI.getRegSizeInBits(*DstRC) / 32;
		if (RegCount > DstSize) {
		ErrInfo = "MIMG instruction returns too many registers for dst "
		"register class";
		return false;
		}
		}
		}
		}

// Verify VOP*. Ignore multiple sgpr operands on writelane.		// Verify VOP*. Ignore multiple sgpr operands on writelane.
if (Desc.getOpcode() != AMDGPU::V_WRITELANE_B32		if (Desc.getOpcode() != AMDGPU::V_WRITELANE_B32
&& (isVOP1(MI) \|\| isVOP2(MI) \|\| isVOP3(MI) \|\| isVOPC(MI) \|\| isSDWA(MI))) {		&& (isVOP1(MI) \|\| isVOP2(MI) \|\| isVOP3(MI) \|\| isVOPC(MI) \|\| isSDWA(MI))) {
// Only look at the true operands. Only a real operand can use the constant		// Only look at the true operands. Only a real operand can use the constant
// bus, and we don't want to check pseudo-operands like the source modifier		// bus, and we don't want to check pseudo-operands like the source modifier
// flags.		// flags.
const int OpIndices[] = { Src0Idx, Src1Idx, Src2Idx };		const int OpIndices[] = { Src0Idx, Src1Idx, Src2Idx };

▲ Show 20 Lines • Show All 2,521 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

	Show First 20 Lines • Show All 178 Lines • ▼ Show 20 Lines
	int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIdx);			int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIdx);

	struct MIMGBaseOpcodeInfo {			struct MIMGBaseOpcodeInfo {
	MIMGBaseOpcode BaseOpcode;			MIMGBaseOpcode BaseOpcode;
	bool Store;			bool Store;
	bool Atomic;			bool Atomic;
	bool AtomicX2;			bool AtomicX2;
	bool Sampler;			bool Sampler;
				bool Gather4;

	uint8_t NumExtraArgs;			uint8_t NumExtraArgs;
	bool Gradients;			bool Gradients;
	bool Coordinates;			bool Coordinates;
	bool LodOrClampOrMip;			bool LodOrClampOrMip;
	bool HasD16;			bool HasD16;
	};			};

	▲ Show 20 Lines • Show All 292 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 796 Lines • ▼ Show 20 Lines	Value *simplifyShrShlDemandedBits(
const APInt &ShlOp1, const APInt &DemandedMask, KnownBits &Known);		const APInt &ShlOp1, const APInt &DemandedMask, KnownBits &Known);

/// Tries to simplify operands to an integer instruction based on its		/// Tries to simplify operands to an integer instruction based on its
/// demanded bits.		/// demanded bits.
bool SimplifyDemandedInstructionBits(Instruction &Inst);		bool SimplifyDemandedInstructionBits(Instruction &Inst);

Value simplifyAMDGCNMemoryIntrinsicDemanded(IntrinsicInst II,		Value simplifyAMDGCNMemoryIntrinsicDemanded(IntrinsicInst II,
APInt DemandedElts,		APInt DemandedElts,
int DmaskIdx = -1);		int DmaskIdx = -1,
		int TFCIdx = -1);

Value SimplifyDemandedVectorElts(Value V, APInt DemandedElts,		Value SimplifyDemandedVectorElts(Value V, APInt DemandedElts,
APInt &UndefElts, unsigned Depth = 0);		APInt &UndefElts, unsigned Depth = 0);

/// Canonicalize the position of binops relative to shufflevector.		/// Canonicalize the position of binops relative to shufflevector.
Instruction *foldVectorBinop(BinaryOperator &Inst);		Instruction *foldVectorBinop(BinaryOperator &Inst);

/// Given a binary operator, cast instruction, or select which has a PHI node		/// Given a binary operator, cast instruction, or select which has a PHI node
▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

Show First 20 Lines • Show All 963 Lines • ▼ Show 20 Lines	InstCombiner::simplifyShrShlDemandedBits(Instruction *Shr, const APInt &ShrOp1,
}		}

return nullptr;		return nullptr;
}		}

/// Implement SimplifyDemandedVectorElts for amdgcn buffer and image intrinsics.		/// Implement SimplifyDemandedVectorElts for amdgcn buffer and image intrinsics.
Value InstCombiner::simplifyAMDGCNMemoryIntrinsicDemanded(IntrinsicInst II,		Value InstCombiner::simplifyAMDGCNMemoryIntrinsicDemanded(IntrinsicInst II,
APInt DemandedElts,		APInt DemandedElts,
int DMaskIdx) {		int DMaskIdx,
		int TFCIdx) {
unsigned VWidth = II->getType()->getVectorNumElements();		unsigned VWidth = II->getType()->getVectorNumElements();
if (VWidth == 1)		if (VWidth == 1)
return nullptr;		return nullptr;

		// Need to change to new instruction format
		ConstantInt *TFC = nullptr;
		bool TFELWEEnabled = false;
		if (TFCIdx > 0) {
		TFC = dyn_cast<ConstantInt>(II->getArgOperand(TFCIdx));
		TFELWEEnabled = TFC->getZExtValue() & 0x1 // TFE
		\|\| TFC->getZExtValue() & 0x2; // LWE
		}

		if (TFELWEEnabled)
		return nullptr; // TFE not yet supported

ConstantInt *NewDMask = nullptr;		ConstantInt *NewDMask = nullptr;

if (DMaskIdx < 0) {		if (DMaskIdx < 0) {
// Pretend that a prefix of elements is demanded to simplify the code		// Pretend that a prefix of elements is demanded to simplify the code
// below.		// below.
DemandedElts = (1 << DemandedElts.getActiveBits()) - 1;		DemandedElts = (1 << DemandedElts.getActiveBits()) - 1;
} else {		} else {
ConstantInt *DMask = dyn_cast<ConstantInt>(II->getArgOperand(DMaskIdx));		ConstantInt *DMask = dyn_cast<ConstantInt>(II->getArgOperand(DMaskIdx));
▲ Show 20 Lines • Show All 632 Lines • ▼ Show 20 Lines	case Instruction::Call: {
case Intrinsic::x86_sse4a_insertqi:		case Intrinsic::x86_sse4a_insertqi:
UndefElts.setHighBits(VWidth / 2);		UndefElts.setHighBits(VWidth / 2);
break;		break;
case Intrinsic::amdgcn_buffer_load:		case Intrinsic::amdgcn_buffer_load:
case Intrinsic::amdgcn_buffer_load_format:		case Intrinsic::amdgcn_buffer_load_format:
return simplifyAMDGCNMemoryIntrinsicDemanded(II, DemandedElts);		return simplifyAMDGCNMemoryIntrinsicDemanded(II, DemandedElts);
default: {		default: {
if (getAMDGPUImageDMaskIntrinsic(II->getIntrinsicID()))		if (getAMDGPUImageDMaskIntrinsic(II->getIntrinsicID()))
return simplifyAMDGCNMemoryIntrinsicDemanded(II, DemandedElts, 0);		return simplifyAMDGCNMemoryIntrinsicDemanded(
		II, DemandedElts, 0, II->getNumArgOperands() - 2);

break;		break;
}		}
} // switch on IntrinsicID		} // switch on IntrinsicID
break;		break;
} // case Call		} // case Call
} // switch on Opcode		} // switch on Opcode

Show All 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.ll

	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,SI %s			; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,SI,SIVI,PRT %s
	; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,SIVI,PRT %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900,PRT %s
				; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-enable-prt-strict-null -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900,NOPRT %s

	; GCN-LABEL: {{^}}load_1d:			; GCN-LABEL: {{^}}load_1d:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm{{$}}			; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm{{$}}
	define amdgpu_ps <4 x float> @load_1d(<8 x i32> inreg %rsrc, i32 %s) {			define amdgpu_ps <4 x float> @load_1d(<8 x i32> inreg %rsrc, i32 %s) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

				; GCN-LABEL: {{^}}load_1d_tfe:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v4, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT-NOT: v_mov_b32_e32 v3
				; GCN: image_load v[0:7], v{{[0-9]+}}, s[0:7] dmask:0xf unorm tfe{{$}}
				; SIVI: buffer_store_dword v4, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v4
				define amdgpu_ps <4 x float> @load_1d_tfe(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

				; GCN-LABEL: {{^}}load_1d_lwe:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v4, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT-NOT: v_mov_b32_e32 v3
				; GCN: image_load v[0:7], v{{[0-9]+}}, s[0:7] dmask:0xf unorm lwe{{$}}
				; SIVI: buffer_store_dword v4, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v4
				define amdgpu_ps <4 x float> @load_1d_lwe(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s) {
				main_body:
				%v = call {<4 x float>, i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 2, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

	; GCN-LABEL: {{^}}load_2d:			; GCN-LABEL: {{^}}load_2d:
	; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm{{$}}			; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm{{$}}
	define amdgpu_ps <4 x float> @load_2d(<8 x i32> inreg %rsrc, i32 %s, i32 %t) {			define amdgpu_ps <4 x float> @load_2d(<8 x i32> inreg %rsrc, i32 %s, i32 %t) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i32(i32 15, i32 %s, i32 %t, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i32(i32 15, i32 %s, i32 %t, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

				; GCN-LABEL: {{^}}load_2d_tfe:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v4, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT-NOT: v_mov_b32_e32 v3
				; GCN: image_load v[0:7], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0xf unorm tfe{{$}}
				; SIVI: buffer_store_dword v4, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v4
				define amdgpu_ps <4 x float> @load_2d_tfe(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s, i32 %t) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.2d.v4f32i32.i32(i32 15, i32 %s, i32 %t, <8 x i32> %rsrc, i32 1, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

	; GCN-LABEL: {{^}}load_3d:			; GCN-LABEL: {{^}}load_3d:
	; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm{{$}}			; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm{{$}}
	define amdgpu_ps <4 x float> @load_3d(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %r) {			define amdgpu_ps <4 x float> @load_3d(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %r) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %r, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %r, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

				; GCN-LABEL: {{^}}load_3d_tfe_lwe:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v4, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT-NOT: v_mov_b32_e32 v3
				; GCN: image_load v[0:7], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0xf unorm tfe lwe{{$}}
				; SIVI: buffer_store_dword v4, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v4
				define amdgpu_ps <4 x float> @load_3d_tfe_lwe(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s, i32 %t, i32 %r) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.3d.v4f32i32.i32(i32 15, i32 %s, i32 %t, i32 %r, <8 x i32> %rsrc, i32 3, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

	; GCN-LABEL: {{^}}load_cube:			; GCN-LABEL: {{^}}load_cube:
	; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm da{{$}}			; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm da{{$}}
	define amdgpu_ps <4 x float> @load_cube(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %slice) {			define amdgpu_ps <4 x float> @load_cube(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %slice) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %slice, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %slice, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

				; GCN-LABEL: {{^}}load_cube_lwe:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v4, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT-NOT: v_mov_b32_e32 v3
				; GCN: image_load v[0:7], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0xf unorm lwe da{{$}}
				; SIVI: buffer_store_dword v4, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v4
				define amdgpu_ps <4 x float> @load_cube_lwe(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s, i32 %t, i32 %slice) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.cube.v4f32i32.i32(i32 15, i32 %s, i32 %t, i32 %slice, <8 x i32> %rsrc, i32 2, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

	; GCN-LABEL: {{^}}load_1darray:			; GCN-LABEL: {{^}}load_1darray:
	; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm da{{$}}			; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm da{{$}}
	define amdgpu_ps <4 x float> @load_1darray(<8 x i32> inreg %rsrc, i32 %s, i32 %slice) {			define amdgpu_ps <4 x float> @load_1darray(<8 x i32> inreg %rsrc, i32 %s, i32 %slice) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i32(i32 15, i32 %s, i32 %slice, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i32(i32 15, i32 %s, i32 %slice, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

				; GCN-LABEL: {{^}}load_1darray_tfe:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v4, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT-NOT: v_mov_b32_e32 v3
				; GCN: image_load v[0:7], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0xf unorm tfe da{{$}}
				; SIVI: buffer_store_dword v4, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v4
				define amdgpu_ps <4 x float> @load_1darray_tfe(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s, i32 %slice) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1darray.v4f32i32.i32(i32 15, i32 %s, i32 %slice, <8 x i32> %rsrc, i32 1, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

	; GCN-LABEL: {{^}}load_2darray:			; GCN-LABEL: {{^}}load_2darray:
	; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm da{{$}}			; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm da{{$}}
	define amdgpu_ps <4 x float> @load_2darray(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %slice) {			define amdgpu_ps <4 x float> @load_2darray(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %slice) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %slice, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %slice, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

				; GCN-LABEL: {{^}}load_2darray_lwe:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v4, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT-NOT: v_mov_b32_e32 v3
				; GCN: image_load v[0:7], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0xf unorm lwe da{{$}}
				; SIVI: buffer_store_dword v4, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v4
				define amdgpu_ps <4 x float> @load_2darray_lwe(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s, i32 %t, i32 %slice) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.2darray.v4f32i32.i32(i32 15, i32 %s, i32 %t, i32 %slice, <8 x i32> %rsrc, i32 2, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

	; GCN-LABEL: {{^}}load_2dmsaa:			; GCN-LABEL: {{^}}load_2dmsaa:
	; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm{{$}}			; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm{{$}}
	define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %fragid) {			define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %fragid) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.2dmsaa.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2dmsaa.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

				; GCN-LABEL: {{^}}load_2dmsaa_both:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v4, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT-NOT: v_mov_b32_e32 v3
				; GCN: image_load v[0:7], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0xf unorm tfe lwe{{$}}
				; SIVI: buffer_store_dword v4, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v4
				define amdgpu_ps <4 x float> @load_2dmsaa_both(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s, i32 %t, i32 %fragid) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.2dmsaa.v4f32i32.i32(i32 15, i32 %s, i32 %t, i32 %fragid, <8 x i32> %rsrc, i32 3, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

	; GCN-LABEL: {{^}}load_2darraymsaa:			; GCN-LABEL: {{^}}load_2darraymsaa:
	; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm da{{$}}			; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm da{{$}}
	define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %slice, i32 %fragid) {			define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %slice, i32 %fragid) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.2darraymsaa.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %slice, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2darraymsaa.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %slice, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

				; GCN-LABEL: {{^}}load_2darraymsaa_tfe:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v4, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT-NOT: v_mov_b32_e32 v3
				; GCN: image_load v[0:7], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0xf unorm tfe da{{$}}
				; SIVI: buffer_store_dword v4, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v4
				define amdgpu_ps <4 x float> @load_2darraymsaa_tfe(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s, i32 %t, i32 %slice, i32 %fragid) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.2darraymsaa.v4f32i32.i32(i32 15, i32 %s, i32 %t, i32 %slice, i32 %fragid, <8 x i32> %rsrc, i32 1, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

	; GCN-LABEL: {{^}}load_mip_1d:			; GCN-LABEL: {{^}}load_mip_1d:
	; GCN: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm{{$}}			; GCN: image_load_mip v[0:3], v[0:1], s[0:7] dmask:0xf unorm{{$}}
	define amdgpu_ps <4 x float> @load_mip_1d(<8 x i32> inreg %rsrc, i32 %s, i32 %mip) {			define amdgpu_ps <4 x float> @load_mip_1d(<8 x i32> inreg %rsrc, i32 %s, i32 %mip) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32 15, i32 %s, i32 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32 15, i32 %s, i32 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

				; GCN-LABEL: {{^}}load_mip_1d_lwe:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v4, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT-NOT: v_mov_b32_e32 v3
				; GCN: image_load_mip v[0:7], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0xf unorm lwe{{$}}
				; SIVI: buffer_store_dword v4, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v4
				define amdgpu_ps <4 x float> @load_mip_1d_lwe(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s, i32 %mip) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.mip.1d.v4f32i32.i32(i32 15, i32 %s, i32 %mip, <8 x i32> %rsrc, i32 2, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

	; GCN-LABEL: {{^}}load_mip_2d:			; GCN-LABEL: {{^}}load_mip_2d:
	; GCN: image_load_mip v[0:3], v[0:3], s[0:7] dmask:0xf unorm{{$}}			; GCN: image_load_mip v[0:3], v[0:3], s[0:7] dmask:0xf unorm{{$}}
	define amdgpu_ps <4 x float> @load_mip_2d(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %mip) {			define amdgpu_ps <4 x float> @load_mip_2d(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %mip) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

				; GCN-LABEL: {{^}}load_mip_2d_tfe:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v4, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT-NOT: v_mov_b32_e32 v3
				; GCN: image_load_mip v[0:7], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0xf unorm tfe{{$}}
				; SIVI: buffer_store_dword v4, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v4
				define amdgpu_ps <4 x float> @load_mip_2d_tfe(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s, i32 %t, i32 %mip) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.mip.2d.v4f32i32.i32(i32 15, i32 %s, i32 %t, i32 %mip, <8 x i32> %rsrc, i32 1, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

				; Make sure that error flag is returned even with dmask 0
				; GCN-LABEL: {{^}}load_1d_V2_tfe_dmask0:
				; GCN: v_mov_b32_e32 v1, 0
				; PRT-DAG: v_mov_b32_e32 v2, v1
				; PRT: image_load v[1:2], v0, s[0:7] dmask:0x1 unorm tfe{{$}}
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT: image_load v[0:1], v0, s[0:7] dmask:0x1 unorm tfe{{$}}
				define amdgpu_ps float @load_1d_V2_tfe_dmask0(<8 x i32> inreg %rsrc, i32 %s) {
				main_body:
				%v = call {<2 x float>,i32} @llvm.amdgcn.image.load.1d.v2f32i32.i32(i32 0, i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
				%v.err = extractvalue {<2 x float>, i32} %v, 1
				%vv = bitcast i32 %v.err to float
				ret float %vv
				}

				; GCN-LABEL: {{^}}load_1d_V1_tfe_dmask0:
				; GCN: v_mov_b32_e32 v1, 0
				; PRT-DAG: v_mov_b32_e32 v2, v1
				; PRT: image_load v[1:2], v0, s[0:7] dmask:0x1 unorm tfe{{$}}
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT: image_load v[0:1], v0, s[0:7] dmask:0x1 unorm tfe{{$}}
				define amdgpu_ps float @load_1d_V1_tfe_dmask0(<8 x i32> inreg %rsrc, i32 %s) {
				main_body:
				%v = call {float,i32} @llvm.amdgcn.image.load.1d.f32i32.i32(i32 0, i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
				%v.err = extractvalue {float, i32} %v, 1
				%vv = bitcast i32 %v.err to float
				ret float %vv
				}

				; GCN-LABEL: {{^}}load_mip_2d_tfe_dmask0:
				; GCN: v_mov_b32_e32 v3, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v3
				; PRT: image_load_mip v[3:4], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0x1 unorm tfe{{$}}
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT: image_load_mip v[2:3], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0x1 unorm tfe{{$}}
				define amdgpu_ps float @load_mip_2d_tfe_dmask0(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %mip) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.mip.2d.v4f32i32.i32(i32 0, i32 %s, i32 %t, i32 %mip, <8 x i32> %rsrc, i32 1, i32 0)
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				%vv = bitcast i32 %v.err to float
				ret float %vv
				}

				; Do not make dmask 0 even if no result (other than tfe) is used.
				; GCN-LABEL: {{^}}load_mip_2d_tfe_nouse:
				; GCN: v_mov_b32_e32 v3, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v3
				; PRT: image_load_mip v[3:4], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0x1 unorm tfe{{$}}
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT: image_load_mip v[2:3], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0x1 unorm tfe{{$}}
				define amdgpu_ps float @load_mip_2d_tfe_nouse(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %mip) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.mip.2d.v4f32i32.i32(i32 15, i32 %s, i32 %t, i32 %mip, <8 x i32> %rsrc, i32 1, i32 0)
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				%vv = bitcast i32 %v.err to float
				ret float %vv
				}

				; GCN-LABEL: {{^}}load_mip_2d_tfe_nouse_V2:
				; GCN: v_mov_b32_e32 v3, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v3
				; PRT: image_load_mip v[3:4], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0x1 unorm tfe{{$}}
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT: image_load_mip v[2:3], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0x1 unorm tfe{{$}}
				define amdgpu_ps float @load_mip_2d_tfe_nouse_V2(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %mip) {
				main_body:
				%v = call {<2 x float>,i32} @llvm.amdgcn.image.load.mip.2d.v2f32i32.i32(i32 6, i32 %s, i32 %t, i32 %mip, <8 x i32> %rsrc, i32 1, i32 0)
				%v.err = extractvalue {<2 x float>, i32} %v, 1
				%vv = bitcast i32 %v.err to float
				ret float %vv
				}

				; GCN-LABEL: {{^}}load_mip_2d_tfe_nouse_V1:
				; GCN: v_mov_b32_e32 v3, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v3
				; PRT: image_load_mip v[3:4], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0x2 unorm tfe{{$}}
				; NOPRT-NOT: v_mov_b32_e32 v2
				; NOPRT: image_load_mip v[2:3], v[{{[0-9]+:[0-9]+}}], s[0:7] dmask:0x2 unorm tfe{{$}}
				define amdgpu_ps float @load_mip_2d_tfe_nouse_V1(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %mip) {
				main_body:
				%v = call {float, i32} @llvm.amdgcn.image.load.mip.2d.f32i32.i32(i32 2, i32 %s, i32 %t, i32 %mip, <8 x i32> %rsrc, i32 1, i32 0)
				%v.err = extractvalue {float, i32} %v, 1
				%vv = bitcast i32 %v.err to float
				ret float %vv
				}

				; Check for dmask being materially smaller than return type
				; GCN-LABEL: {{^}}load_1d_tfe_V4_dmask3:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v3, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; NOPRT-NOT: v_mov_b32_e32 v2
				; GCN: image_load v[0:3], v{{[0-9]+}}, s[0:7] dmask:0x7 unorm tfe{{$}}
				; SIVI: buffer_store_dword v3, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v3
				define amdgpu_ps <4 x float> @load_1d_tfe_V4_dmask3(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 7, i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

				; GCN-LABEL: {{^}}load_1d_tfe_V4_dmask2:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v2, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; NOPRT-NOT: v_mov_b32_e32 v1
				; GCN: image_load v[0:3], v{{[0-9]+}}, s[0:7] dmask:0x6 unorm tfe{{$}}
				; SIVI: buffer_store_dword v2, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v2
				define amdgpu_ps <4 x float> @load_1d_tfe_V4_dmask2(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 6, i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

				; GCN-LABEL: {{^}}load_1d_tfe_V4_dmask1:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v1, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; GCN: image_load v[0:1], v{{[0-9]+}}, s[0:7] dmask:0x8 unorm tfe{{$}}
				; SIVI: buffer_store_dword v1, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v1
				define amdgpu_ps <4 x float> @load_1d_tfe_V4_dmask1(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 8, i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

				; GCN-LABEL: {{^}}load_1d_tfe_V2_dmask1:
				; PRT: v_mov_b32_e32 v0, 0
				; PRT-DAG: v_mov_b32_e32 v{{[0-9]+}}, v0
				; NOPRT: v_mov_b32_e32 v1, 0
				; NOPRT-NOT: v_mov_b32_e32 v0
				; GCN: image_load v[0:1], v{{[0-9]+}}, s[0:7] dmask:0x8 unorm tfe{{$}}
				; SIVI: buffer_store_dword v1, off, s[8:11], 0
				; GFX900: global_store_dword v[{{[0-9]+:[0-9]+}}], v1
				define amdgpu_ps <2 x float> @load_1d_tfe_V2_dmask1(<8 x i32> inreg %rsrc, i32 addrspace(1)* inreg %out, i32 %s) {
				main_body:
				%v = call {<2 x float>,i32} @llvm.amdgcn.image.load.1d.v2f32i32.i32(i32 8, i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
				%v.vec = extractvalue {<2 x float>, i32} %v, 0
				%v.err = extractvalue {<2 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <2 x float> %v.vec
				}


	; GCN-LABEL: {{^}}load_mip_3d:			; GCN-LABEL: {{^}}load_mip_3d:
	; GCN: image_load_mip v[0:3], v[0:3], s[0:7] dmask:0xf unorm{{$}}			; GCN: image_load_mip v[0:3], v[0:3], s[0:7] dmask:0xf unorm{{$}}
	define amdgpu_ps <4 x float> @load_mip_3d(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %r, i32 %mip) {			define amdgpu_ps <4 x float> @load_mip_3d(<8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %r, i32 %mip) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %r, i32 %mip, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %r, i32 %mip, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines
	; we only have check lines for VI.			; we only have check lines for VI.
	; VI-LABEL: image_load_mmo			; VI-LABEL: image_load_mmo
	; VI: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0			; VI: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
	; VI: ds_write2_b32 v{{[0-9]+}}, [[ZERO]], [[ZERO]] offset1:4			; VI: ds_write2_b32 v{{[0-9]+}}, [[ZERO]], [[ZERO]] offset1:4
	define amdgpu_ps float @image_load_mmo(<8 x i32> inreg %rsrc, float addrspace(3)* %lds, <2 x i32> %c) #0 {			define amdgpu_ps float @image_load_mmo(<8 x i32> inreg %rsrc, float addrspace(3)* %lds, <2 x i32> %c) #0 {
	store float 0.000000e+00, float addrspace(3)* %lds			store float 0.000000e+00, float addrspace(3)* %lds
	%c0 = extractelement <2 x i32> %c, i32 0			%c0 = extractelement <2 x i32> %c, i32 0
	%c1 = extractelement <2 x i32> %c, i32 1			%c1 = extractelement <2 x i32> %c, i32 1
	%tex = call float @llvm.amdgcn.image.load.2d.f32.i32(i32 15, i32 %c0, i32 %c1, <8 x i32> %rsrc, i32 0, i32 0)			%tex = call float @llvm.amdgcn.image.load.2d.f32.i32(i32 1, i32 %c0, i32 %c1, <8 x i32> %rsrc, i32 0, i32 0)
	%tmp2 = getelementptr float, float addrspace(3)* %lds, i32 4			%tmp2 = getelementptr float, float addrspace(3)* %lds, i32 4
	store float 0.000000e+00, float addrspace(3)* %tmp2			store float 0.000000e+00, float addrspace(3)* %tmp2
	ret float %tex			ret float %tex
	}			}

	declare <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32(i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32(i32, i32, <8 x i32>, i32, i32) #1
				declare {float,i32} @llvm.amdgcn.image.load.1d.f32i32.i32(i32, i32, <8 x i32>, i32, i32) #1
				declare {<2 x float>,i32} @llvm.amdgcn.image.load.1d.v2f32i32.i32(i32, i32, <8 x i32>, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.load.2d.v4f32i32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.load.3d.v4f32i32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.load.cube.v4f32i32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.load.1darray.v4f32i32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.load.2darray.v4f32i32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.2dmsaa.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.2dmsaa.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.load.2dmsaa.v4f32i32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.2darraymsaa.v4f32.i32(i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.2darraymsaa.v4f32.i32(i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.load.2darraymsaa.v4f32i32.i32(i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.load.mip.1d.v4f32i32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.load.mip.2d.v4f32i32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
				declare {<2 x float>,i32} @llvm.amdgcn.image.load.mip.2d.v2f32i32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
				declare {float,i32} @llvm.amdgcn.image.load.mip.2d.f32i32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i32(i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i32(i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.mip.cube.v4f32.i32(i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.mip.cube.v4f32.i32(i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.mip.1darray.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.mip.1darray.v4f32.i32(i32, i32, i32, i32, <8 x i32>, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.load.mip.2darray.v4f32.i32(i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.mip.2darray.v4f32.i32(i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #1

	declare void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float>, i32, i32, <8 x i32>, i32, i32) #0			declare void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float>, i32, i32, <8 x i32>, i32, i32) #0
	declare void @llvm.amdgcn.image.store.2d.v4f32.i32(<4 x float>, i32, i32, i32, <8 x i32>, i32, i32) #0			declare void @llvm.amdgcn.image.store.2d.v4f32.i32(<4 x float>, i32, i32, i32, <8 x i32>, i32, i32) #0
	declare void @llvm.amdgcn.image.store.3d.v4f32.i32(<4 x float>, i32, i32, i32, i32, <8 x i32>, i32, i32) #0			declare void @llvm.amdgcn.image.store.3d.v4f32.i32(<4 x float>, i32, i32, i32, i32, <8 x i32>, i32, i32) #0
	Show All 31 Lines

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.image.load.a16.d16.ll

	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s

	; GCN-LABEL: {{^}}load.f16.1d:			; GCN-LABEL: {{^}}load.f16.1d:
	; GCN: image_load v[0:1], v0, s[0:7] dmask:0x1 unorm a16 d16			; GCN: image_load v0, v0, s[0:7] dmask:0x1 unorm a16 d16
	define amdgpu_ps <4 x half> @load.f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f16.1d:			; GCN-LABEL: {{^}}load.v2f16.1d:
	; GCN: image_load v[0:1], v0, s[0:7] dmask:0x3 unorm a16 d16			; GCN: image_load v0, v0, s[0:7] dmask:0x3 unorm a16 d16
	define amdgpu_ps <4 x half> @load.v2f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.v2f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v3f16.1d:			; GCN-LABEL: {{^}}load.v3f16.1d:
	Show All 10 Lines
	define amdgpu_ps <4 x half> @load.v4f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.v4f16.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.1d.v4f16.i16(i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.f16.2d:			; GCN-LABEL: {{^}}load.f16.2d:
	; GCN: image_load v[0:1], v0, s[0:7] dmask:0x1 unorm a16 d16			; GCN: image_load v0, v0, s[0:7] dmask:0x1 unorm a16 d16
	define amdgpu_ps <4 x half> @load.f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f16.2d:			; GCN-LABEL: {{^}}load.v2f16.2d:
	; GCN: image_load v[0:1], v0, s[0:7] dmask:0x3 unorm a16 d16			; GCN: image_load v0, v0, s[0:7] dmask:0x3 unorm a16 d16
	define amdgpu_ps <4 x half> @load.v2f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x half> @load.v2f16.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	Show All 13 Lines
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.2d.v4f16.i16(i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.f16.3d:			; GCN-LABEL: {{^}}load.f16.3d:
	; GCN: image_load v[0:1], v[0:1], s[0:7] dmask:0x1 unorm a16 d16			; GCN: image_load v0, v[0:1], s[0:7] dmask:0x1 unorm a16 d16
	define amdgpu_ps <4 x half> @load.f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x half> @load.f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f16.3d:			; GCN-LABEL: {{^}}load.v2f16.3d:
	; GCN: image_load v[0:1], v[0:1], s[0:7] dmask:0x3 unorm a16 d16			; GCN: image_load v0, v[0:1], s[0:7] dmask:0x3 unorm a16 d16
	define amdgpu_ps <4 x half> @load.v2f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x half> @load.v2f16.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x half> @llvm.amdgcn.image.load.3d.v4f16.i16(i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x half> %v			ret <4 x half> %v
	}			}
	Show All 29 Lines

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.image.load.a16.ll

	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s

	; GCN-LABEL: {{^}}load.f32.1d:			; GCN-LABEL: {{^}}load.f32.1d:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0x1 unorm a16			; GCN: image_load v0, v0, s[0:7] dmask:0x1 unorm a16
	define amdgpu_ps <4 x float> @load.f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 1, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f32.1d:			; GCN-LABEL: {{^}}load.v2f32.1d:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0x3 unorm a16			; GCN: image_load v[0:1], v0, s[0:7] dmask:0x3 unorm a16
	define amdgpu_ps <4 x float> @load.v2f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.v2f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 3, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v3f32.1d:			; GCN-LABEL: {{^}}load.v3f32.1d:
	Show All 10 Lines
	define amdgpu_ps <4 x float> @load.v4f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.v4f32.1d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 %x, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.f32.2d:			; GCN-LABEL: {{^}}load.f32.2d:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0x1 unorm a16			; GCN: image_load v0, v0, s[0:7] dmask:0x1 unorm a16
	define amdgpu_ps <4 x float> @load.f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 1, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f32.2d:			; GCN-LABEL: {{^}}load.v2f32.2d:
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0x3 unorm a16			; GCN: image_load v[0:1], v0, s[0:7] dmask:0x3 unorm a16
	define amdgpu_ps <4 x float> @load.v2f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {			define amdgpu_ps <4 x float> @load.v2f32.2d(<8 x i32> inreg %rsrc, <2 x i16> %coords) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 3, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 13 Lines
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords, i32 0			%x = extractelement <2 x i16> %coords, i32 0
	%y = extractelement <2 x i16> %coords, i32 1			%y = extractelement <2 x i16> %coords, i32 1
	%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 %x, i16 %y, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.f32.3d:			; GCN-LABEL: {{^}}load.f32.3d:
	; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0x1 unorm a16			; GCN: image_load v0, v[0:1], s[0:7] dmask:0x1 unorm a16
	define amdgpu_ps <4 x float> @load.f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load.f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 1, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	; GCN-LABEL: {{^}}load.v2f32.3d:			; GCN-LABEL: {{^}}load.v2f32.3d:
	; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0x3 unorm a16			; GCN: image_load v[0:1], v[0:1], s[0:7] dmask:0x3 unorm a16
	define amdgpu_ps <4 x float> @load.v2f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {			define amdgpu_ps <4 x float> @load.v2f32.3d(<8 x i32> inreg %rsrc, <2 x i16> %coords_lo, <2 x i16> %coords_hi) {
	main_body:			main_body:
	%x = extractelement <2 x i16> %coords_lo, i32 0			%x = extractelement <2 x i16> %coords_lo, i32 0
	%y = extractelement <2 x i16> %coords_lo, i32 1			%y = extractelement <2 x i16> %coords_lo, i32 1
	%z = extractelement <2 x i16> %coords_hi, i32 0			%z = extractelement <2 x i16> %coords_hi, i32 0
	%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i16(i32 3, i16 %x, i16 %y, i16 %z, <8 x i32> %rsrc, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}
	Show All 29 Lines

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.d16.dim.ll

	; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck -check-prefix=GCN -check-prefix=UNPACKED %s			; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck -check-prefix=GCN -check-prefix=UNPACKED %s
	; RUN: llc < %s -march=amdgcn -mcpu=gfx810 -verify-machineinstrs \| FileCheck -check-prefix=GCN -check-prefix=PACKED -check-prefix=GFX81 %s			; RUN: llc < %s -march=amdgcn -mcpu=gfx810 -verify-machineinstrs \| FileCheck -check-prefix=GCN -check-prefix=PACKED -check-prefix=GFX81 %s
	; RUN: llc < %s -march=amdgcn -mcpu=gfx900 -verify-machineinstrs \| FileCheck -check-prefix=GCN -check-prefix=PACKED -check-prefix=GFX9 %s			; RUN: llc < %s -march=amdgcn -mcpu=gfx900 -verify-machineinstrs \| FileCheck -check-prefix=GCN -check-prefix=PACKED -check-prefix=GFX9 %s

	; GCN-LABEL: {{^}}image_sample_2d_f16:			; GCN-LABEL: {{^}}image_sample_2d_f16:
	; GCN: image_sample v0, v[0:1], s[0:7], s[8:11] dmask:0x1 d16{{$}}			; GCN: image_sample v0, v[0:1], s[0:7], s[8:11] dmask:0x1 d16{{$}}
	define amdgpu_ps half @image_sample_2d_f16(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {			define amdgpu_ps half @image_sample_2d_f16(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {
	main_body:			main_body:
	%tex = call half @llvm.amdgcn.image.sample.2d.f16.f32(i32 1, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%tex = call half @llvm.amdgcn.image.sample.2d.f16.f32(i32 1, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret half %tex			ret half %tex
	}			}

				; GCN-LABEL: {{^}}image_sample_2d_f16_tfe:
				; GCN: v_mov_b32_e32 v{{[0-9]+}}, 0
				; PACKED: image_sample v[2:3], v[0:1], s[0:7], s[8:11] dmask:0x1 tfe d16{{$}}
				; UNPACKED: image_sample v[2:3], v[0:1], s[0:7], s[8:11] dmask:0x1 tfe d16{{$}}
				define amdgpu_ps half @image_sample_2d_f16_tfe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, i32 addrspace(1)* inreg %out) {
				main_body:
				%tex = call {half,i32} @llvm.amdgcn.image.sample.2d.f16i32.f32(i32 1, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 1, i32 0)
				%tex.vec = extractvalue {half, i32} %tex, 0
				%tex.err = extractvalue {half, i32} %tex, 1
				store i32 %tex.err, i32 addrspace(1)* %out, align 4
				ret half %tex.vec
				}

	; GCN-LABEL: {{^}}image_sample_c_d_1d_v2f16:			; GCN-LABEL: {{^}}image_sample_c_d_1d_v2f16:
	; UNPACKED: image_sample_c_d v[0:1], v[0:3], s[0:7], s[8:11] dmask:0x3 d16{{$}}			; UNPACKED: image_sample_c_d v[0:1], v[0:3], s[0:7], s[8:11] dmask:0x3 d16{{$}}
	; PACKED: image_sample_c_d v0, v[0:3], s[0:7], s[8:11] dmask:0x3 d16{{$}}			; PACKED: image_sample_c_d v0, v[0:3], s[0:7], s[8:11] dmask:0x3 d16{{$}}
	define amdgpu_ps float @image_sample_c_d_1d_v2f16(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dsdv, float %s) {			define amdgpu_ps float @image_sample_c_d_1d_v2f16(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dsdv, float %s) {
	main_body:			main_body:
	%tex = call <2 x half> @llvm.amdgcn.image.sample.c.d.1d.v2f16.f32.f32(i32 3, float %zcompare, float %dsdh, float %dsdv, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%tex = call <2 x half> @llvm.amdgcn.image.sample.c.d.1d.v2f16.f32.f32(i32 3, float %zcompare, float %dsdh, float %dsdv, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	%r = bitcast <2 x half> %tex to float			%r = bitcast <2 x half> %tex to float
	ret float %r			ret float %r
	}			}

				; GCN-LABEL: {{^}}image_sample_c_d_1d_v2f16_tfe:
				; GCN: v_mov_b32_e32 v{{[0-9]+}}, 0
				; UNPACKED: image_sample_c_d v[{{[0-9]+:[0-9]+}}], v[0:3], s[0:7], s[8:11] dmask:0x3 tfe d16{{$}}
				; PACKED: image_sample_c_d v[{{[0-9]+:[0-9]+}}], v[0:3], s[0:7], s[8:11] dmask:0x3 tfe d16{{$}}
				define amdgpu_ps <2 x float> @image_sample_c_d_1d_v2f16_tfe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dsdv, float %s) {
				main_body:
				%tex = call {<2 x half>,i32} @llvm.amdgcn.image.sample.c.d.1d.v2f16i32.f32.f32(i32 3, float %zcompare, float %dsdh, float %dsdv, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 1, i32 0)
				%tex.vec = extractvalue {<2 x half>, i32} %tex, 0
				%tex.err = extractvalue {<2 x half>, i32} %tex, 1
				%tex.vecf = bitcast <2 x half> %tex.vec to float
				%r.0 = insertelement <2 x float> undef, float %tex.vecf, i32 0
				%tex.errf = bitcast i32 %tex.err to float
				%r = insertelement <2 x float> %r.0, float %tex.errf, i32 1
				ret <2 x float> %r
				}

	; GCN-LABEL: {{^}}image_sample_b_2d_v4f16:			; GCN-LABEL: {{^}}image_sample_b_2d_v4f16:
	; UNPACKED: image_sample_b v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf d16{{$}}			; UNPACKED: image_sample_b v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf d16{{$}}
	; PACKED: image_sample_b v[0:1], v[0:3], s[0:7], s[8:11] dmask:0xf d16{{$}}			; PACKED: image_sample_b v[0:1], v[0:3], s[0:7], s[8:11] dmask:0xf d16{{$}}
	define amdgpu_ps <2 x float> @image_sample_b_2d_v4f16(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t) {			define amdgpu_ps <2 x float> @image_sample_b_2d_v4f16(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t) {
	main_body:			main_body:
	%tex = call <4 x half> @llvm.amdgcn.image.sample.b.2d.v4f16.f32.f32(i32 15, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%tex = call <4 x half> @llvm.amdgcn.image.sample.b.2d.v4f16.f32.f32(i32 15, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	%r = bitcast <4 x half> %tex to <2 x float>			%r = bitcast <4 x half> %tex to <2 x float>
	ret <2 x float> %r			ret <2 x float> %r
	}			}

				; GCN-LABEL: {{^}}image_sample_b_2d_v4f16_tfe:
				; GCN: v_mov_b32_e32 v{{[0-9]+}}, 0
				; UNPACKED: image_sample_b v[{{[0-9]+:[0-9]+}}], v[0:3], s[0:7], s[8:11] dmask:0xf tfe d16{{$}}
				; PACKED: image_sample_b v[{{[0-9]+:[0-9]+}}], v[0:3], s[0:7], s[8:11] dmask:0xf tfe d16{{$}}
				define amdgpu_ps <4 x float> @image_sample_b_2d_v4f16_tfe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t) {
				main_body:
				%tex = call {<4 x half>,i32} @llvm.amdgcn.image.sample.b.2d.v4f16i32.f32.f32(i32 15, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 1, i32 0)
				%tex.vec = extractvalue {<4 x half>, i32} %tex, 0
				%tex.err = extractvalue {<4 x half>, i32} %tex, 1
				%tex.vecf = bitcast <4 x half> %tex.vec to <2 x float>
				%tex.vecf.0 = extractelement <2 x float> %tex.vecf, i32 0
				%tex.vecf.1 = extractelement <2 x float> %tex.vecf, i32 1
				%r.0 = insertelement <4 x float> undef, float %tex.vecf.0, i32 0
				%r.1 = insertelement <4 x float> %r.0, float %tex.vecf.1, i32 1
				%tex.errf = bitcast i32 %tex.err to float
				%r = insertelement <4 x float> %r.1, float %tex.errf, i32 2
				ret <4 x float> %r
				}

	declare half @llvm.amdgcn.image.sample.2d.f16.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare half @llvm.amdgcn.image.sample.2d.f16.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
				declare {half,i32} @llvm.amdgcn.image.sample.2d.f16i32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
				declare <4 x half> @llvm.amdgcn.image.sample.2d.v4f16.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
				declare {<2 x half>,i32} @llvm.amdgcn.image.sample.2d.v2f16i32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <2 x half> @llvm.amdgcn.image.sample.c.d.1d.v2f16.f32.f32(i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <2 x half> @llvm.amdgcn.image.sample.c.d.1d.v2f16.f32.f32(i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
				declare {<2 x half>,i32} @llvm.amdgcn.image.sample.c.d.1d.v2f16i32.f32.f32(i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x half> @llvm.amdgcn.image.sample.b.2d.v4f16.f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x half> @llvm.amdgcn.image.sample.b.2d.v4f16.f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
				declare {<4 x half>,i32} @llvm.amdgcn.image.sample.b.2d.v4f16i32.f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll

	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s

	; GCN-LABEL: {{^}}sample_1d:			; GCN-LABEL: {{^}}sample_1d:
	; GCN: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf{{$}}			; GCN: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf{{$}}
	define amdgpu_ps <4 x float> @sample_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_ps <4 x float> @sample_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

				; GCN-LABEL: {{^}}sample_1d_tfe:
				; GCN: v_mov_b32_e32 v0, 0
				; GCN: v_mov_b32_e32 v1, v0
				; GCN: v_mov_b32_e32 v2, v0
				; GCN: v_mov_b32_e32 v3, v0
				; GCN: v_mov_b32_e32 v4, v0
				; GCN: image_sample v[0:7], v5, s[0:7], s[8:11] dmask:0xf tfe{{$}}
				define amdgpu_ps <4 x float> @sample_1d_tfe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 addrspace(1)* inreg %out, float %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

				; GCN-LABEL: {{^}}sample_1d_tfe_adjust_writemask_1:
				; GCN: v_mov_b32_e32 v0, 0
				; GCN: v_mov_b32_e32 v1, v0
				; GCN: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x1 tfe{{$}}
				define amdgpu_ps <2 x float> @sample_1d_tfe_adjust_writemask_1(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 addrspace(1)* inreg %out, float %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
				%res.vec = extractvalue {<4 x float>,i32} %v, 0
				%res.f = extractelement <4 x float> %res.vec, i32 0
				%res.err = extractvalue {<4 x float>,i32} %v, 1
				%res.errf = bitcast i32 %res.err to float
				%res.tmp = insertelement <2 x float> undef, float %res.f, i32 0
				%res = insertelement <2 x float> %res.tmp, float %res.errf, i32 1
				ret <2 x float> %res
				}

				; GCN-LABEL: {{^}}sample_1d_tfe_adjust_writemask_2:
				; GCN: v_mov_b32_e32 v0, 0
				; GCN: v_mov_b32_e32 v1, v0
				; GCN: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x2 tfe{{$}}
				define amdgpu_ps <2 x float> @sample_1d_tfe_adjust_writemask_2(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
				%res.vec = extractvalue {<4 x float>,i32} %v, 0
				%res.f = extractelement <4 x float> %res.vec, i32 1
				%res.err = extractvalue {<4 x float>,i32} %v, 1
				%res.errf = bitcast i32 %res.err to float
				%res.tmp = insertelement <2 x float> undef, float %res.f, i32 0
				%res = insertelement <2 x float> %res.tmp, float %res.errf, i32 1
				ret <2 x float> %res
				}

				; GCN-LABEL: {{^}}sample_1d_tfe_adjust_writemask_3:
				; GCN: v_mov_b32_e32 v0, 0
				; GCN: v_mov_b32_e32 v1, v0
				; GCN: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x4 tfe{{$}}
				define amdgpu_ps <2 x float> @sample_1d_tfe_adjust_writemask_3(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
				%res.vec = extractvalue {<4 x float>,i32} %v, 0
				%res.f = extractelement <4 x float> %res.vec, i32 2
				%res.err = extractvalue {<4 x float>,i32} %v, 1
				%res.errf = bitcast i32 %res.err to float
				%res.tmp = insertelement <2 x float> undef, float %res.f, i32 0
				%res = insertelement <2 x float> %res.tmp, float %res.errf, i32 1
				ret <2 x float> %res
				}

				; GCN-LABEL: {{^}}sample_1d_tfe_adjust_writemask_4:
				; GCN: v_mov_b32_e32 v0, 0
				; GCN: v_mov_b32_e32 v1, v0
				; GCN: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x8 tfe{{$}}
				define amdgpu_ps <2 x float> @sample_1d_tfe_adjust_writemask_4(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
				%res.vec = extractvalue {<4 x float>,i32} %v, 0
				%res.f = extractelement <4 x float> %res.vec, i32 3
				%res.err = extractvalue {<4 x float>,i32} %v, 1
				%res.errf = bitcast i32 %res.err to float
				%res.tmp = insertelement <2 x float> undef, float %res.f, i32 0
				%res = insertelement <2 x float> %res.tmp, float %res.errf, i32 1
				ret <2 x float> %res
				}

				; GCN-LABEL: {{^}}sample_1d_tfe_adjust_writemask_12:
				; GCN: v_mov_b32_e32 v0, 0
				; GCN: v_mov_b32_e32 v1, v0
				; GCN: v_mov_b32_e32 v2, v0
				; GCN: image_sample v[0:2], v3, s[0:7], s[8:11] dmask:0x3 tfe{{$}}
				define amdgpu_ps <4 x float> @sample_1d_tfe_adjust_writemask_12(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
				%res.vec = extractvalue {<4 x float>,i32} %v, 0
				%res.f1 = extractelement <4 x float> %res.vec, i32 0
				%res.f2 = extractelement <4 x float> %res.vec, i32 1
				%res.err = extractvalue {<4 x float>,i32} %v, 1
				%res.errf = bitcast i32 %res.err to float
				%res.tmp1 = insertelement <4 x float> undef, float %res.f1, i32 0
				%res.tmp2 = insertelement <4 x float> %res.tmp1, float %res.f2, i32 1
				%res = insertelement <4 x float> %res.tmp2, float %res.errf, i32 2
				ret <4 x float> %res
				}

				; GCN-LABEL: {{^}}sample_1d_tfe_adjust_writemask_24:
				; GCN: v_mov_b32_e32 v0, 0
				; GCN: v_mov_b32_e32 v1, v0
				; GCN: v_mov_b32_e32 v2, v0
				; GCN: image_sample v[0:2], v3, s[0:7], s[8:11] dmask:0xa tfe{{$}}
				define amdgpu_ps <4 x float> @sample_1d_tfe_adjust_writemask_24(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
				%res.vec = extractvalue {<4 x float>,i32} %v, 0
				%res.f1 = extractelement <4 x float> %res.vec, i32 1
				%res.f2 = extractelement <4 x float> %res.vec, i32 3
				%res.err = extractvalue {<4 x float>,i32} %v, 1
				%res.errf = bitcast i32 %res.err to float
				%res.tmp1 = insertelement <4 x float> undef, float %res.f1, i32 0
				%res.tmp2 = insertelement <4 x float> %res.tmp1, float %res.f2, i32 1
				%res = insertelement <4 x float> %res.tmp2, float %res.errf, i32 2
				ret <4 x float> %res
				}

				; GCN-LABEL: {{^}}sample_1d_tfe_adjust_writemask_134:
				; GCN: v_mov_b32_e32 v0, 0
				; GCN: v_mov_b32_e32 v1, v0
				; GCN: v_mov_b32_e32 v2, v0
				; GCN: v_mov_b32_e32 v3, v0
				; GCN: image_sample v[0:3], v4, s[0:7], s[8:11] dmask:0xd tfe{{$}}
				define amdgpu_ps <4 x float> @sample_1d_tfe_adjust_writemask_134(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
				%res.vec = extractvalue {<4 x float>,i32} %v, 0
				%res.f1 = extractelement <4 x float> %res.vec, i32 0
				%res.f2 = extractelement <4 x float> %res.vec, i32 2
				%res.f3 = extractelement <4 x float> %res.vec, i32 3
				%res.err = extractvalue {<4 x float>,i32} %v, 1
				%res.errf = bitcast i32 %res.err to float
				%res.tmp1 = insertelement <4 x float> undef, float %res.f1, i32 0
				%res.tmp2 = insertelement <4 x float> %res.tmp1, float %res.f2, i32 1
				%res.tmp3 = insertelement <4 x float> %res.tmp2, float %res.f3, i32 2
				%res = insertelement <4 x float> %res.tmp3, float %res.errf, i32 3
				ret <4 x float> %res
				}

				; GCN-LABEL: {{^}}sample_1d_lwe:
				; GCN: v_mov_b32_e32 v0, 0
				; GCN: v_mov_b32_e32 v1, v0
				; GCN: v_mov_b32_e32 v2, v0
				; GCN: v_mov_b32_e32 v3, v0
				; GCN: v_mov_b32_e32 v4, v0
				; GCN: image_sample v[0:7], v5, s[0:7], s[8:11] dmask:0xf lwe{{$}}
				define amdgpu_ps <4 x float> @sample_1d_lwe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 addrspace(1)* inreg %out, float %s) {
				main_body:
				%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 2, i32 0)
				%v.vec = extractvalue {<4 x float>, i32} %v, 0
				%v.err = extractvalue {<4 x float>, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret <4 x float> %v.vec
				}

	; GCN-LABEL: {{^}}sample_2d:			; GCN-LABEL: {{^}}sample_2d:
	; GCN: image_sample v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf{{$}}			; GCN: image_sample v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf{{$}}
	define amdgpu_ps <4 x float> @sample_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {			define amdgpu_ps <4 x float> @sample_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	▲ Show 20 Lines • Show All 336 Lines • ▼ Show 20 Lines
	; GCN-LABEL: {{^}}sample_c_d_o_2darray_V1:			; GCN-LABEL: {{^}}sample_c_d_o_2darray_V1:
	; GCN: image_sample_c_d_o v0, v[0:15], s[0:7], s[8:11] dmask:0x4 da{{$}}			; GCN: image_sample_c_d_o v0, v[0:15], s[0:7], s[8:11] dmask:0x4 da{{$}}
	define amdgpu_ps float @sample_c_d_o_2darray_V1(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice) {			define amdgpu_ps float @sample_c_d_o_2darray_V1(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice) {
	main_body:			main_body:
	%v = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f32.f32(i32 4, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f32.f32(i32 4, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret float %v			ret float %v
	}			}

				; GCN-LABEL: {{^}}sample_c_d_o_2darray_V1_tfe:
				; GCN: image_sample_c_d_o v[9:10], v[0:15], s[0:7], s[8:11] dmask:0x4 tfe da{{$}}
				define amdgpu_ps float @sample_c_d_o_2darray_V1_tfe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice, i32 addrspace(1)* inreg %out) {
				main_body:
				%v = call {float,i32} @llvm.amdgcn.image.sample.c.d.o.2darray.f32i32.f32.f32(i32 4, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
				%v.vec = extractvalue {float, i32} %v, 0
				%v.err = extractvalue {float, i32} %v, 1
				store i32 %v.err, i32 addrspace(1)* %out, align 4
				ret float %v.vec
				}

	; GCN-LABEL: {{^}}sample_c_d_o_2darray_V2:			; GCN-LABEL: {{^}}sample_c_d_o_2darray_V2:
	; GCN: image_sample_c_d_o v[0:1], v[0:15], s[0:7], s[8:11] dmask:0x6 da{{$}}			; GCN: image_sample_c_d_o v[0:1], v[0:15], s[0:7], s[8:11] dmask:0x6 da{{$}}
	define amdgpu_ps <2 x float> @sample_c_d_o_2darray_V2(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice) {			define amdgpu_ps <2 x float> @sample_c_d_o_2darray_V2(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice) {
	main_body:			main_body:
	%v = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32 6, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32 6, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <2 x float> %v			ret <2 x float> %v
	}			}

				; GCN-LABEL: {{^}}sample_c_d_o_2darray_V2_tfe:
				; GCN: image_sample_c_d_o v[9:12], v[0:15], s[0:7], s[8:11] dmask:0x6 tfe da{{$}}
				define amdgpu_ps <4 x float> @sample_c_d_o_2darray_V2_tfe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice) {
				main_body:
				%v = call {<2 x float>, i32} @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32i32.f32.f32(i32 6, i32 %offset, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
				%v.vec = extractvalue {<2 x float>, i32} %v, 0
				%v.f1 = extractelement <2 x float> %v.vec, i32 0
				%v.f2 = extractelement <2 x float> %v.vec, i32 1
				%v.err = extractvalue {<2 x float>, i32} %v, 1
				%v.errf = bitcast i32 %v.err to float
				%res.0 = insertelement <4 x float> undef, float %v.f1, i32 0
				%res.1 = insertelement <4 x float> %res.0, float %v.f2, i32 1
				%res.2 = insertelement <4 x float> %res.1, float %v.errf, i32 2
				ret <4 x float> %res.2
				}

	; GCN-LABEL: {{^}}sample_1d_unorm:			; GCN-LABEL: {{^}}sample_1d_unorm:
	; GCN: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf unorm{{$}}			; GCN: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf unorm{{$}}
	define amdgpu_ps <4 x float> @sample_1d_unorm(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_ps <4 x float> @sample_1d_unorm(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 1, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 1, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	define amdgpu_ps <2 x float> @adjust_writemask_sample_013_to_13(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_ps <2 x float> @adjust_writemask_sample_013_to_13(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	main_body:			main_body:
	%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 11, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 11, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 1, i32 2>			%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 1, i32 2>
	ret <2 x float> %out			ret <2 x float> %out
	}			}

	declare <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	Show All 35 Lines
	declare <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f32(i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f32(i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	declare float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f32.f32(i32, i32, float, float, float, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f32.f32(i32, i32, float, float, float, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
				declare {float, i32} @llvm.amdgcn.image.sample.c.d.o.2darray.f32i32.f32.f32(i32, i32, float, float, float, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32, i32, float, float, float, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32, i32, float, float, float, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
				declare {<2 x float>, i32} @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32i32.f32.f32(i32, i32, float, float, float, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }

llvm/trunk/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll

	Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: %data = call float @llvm.amdgcn.image.sample.1d.f32.f32(i32 1, float %vaddr, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)			; CHECK-NEXT: %data = call float @llvm.amdgcn.image.sample.1d.f32.f32(i32 1, float %vaddr, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)
	; CHECK-NEXT: ret float %data			; CHECK-NEXT: ret float %data
	define amdgpu_ps float @extract_elt0_image_sample_1d_v4f32_f32(float %vaddr, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {			define amdgpu_ps float @extract_elt0_image_sample_1d_v4f32_f32(float %vaddr, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {
	%data = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %vaddr, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)			%data = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %vaddr, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)
	%elt0 = extractelement <4 x float> %data, i32 0			%elt0 = extractelement <4 x float> %data, i32 0
	ret float %elt0			ret float %elt0
	}			}

				; Check that the intrinsic remains unchanged in the presence of TFE or LWE
				; CHECK-LABEL: @extract_elt0_image_sample_1d_v4f32_f32_tfe(
				; CHECK-NEXT: %data = call { <4 x float>, i32 } @llvm.amdgcn.image.sample.1d.sl_v4f32i32s.f32(i32 15, float %vaddr, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 1, i32 0)
				; CHECK: ret float %elt0
				define amdgpu_ps float @extract_elt0_image_sample_1d_v4f32_f32_tfe(float %vaddr, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {
				%data = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.sl_v4f32i32s.f32(i32 15, float %vaddr, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 1, i32 0)
				%data.vec = extractvalue {<4 x float>,i32} %data, 0
				%elt0 = extractelement <4 x float> %data.vec, i32 0
				ret float %elt0
				}

				; Check that the intrinsic remains unchanged in the presence of TFE or LWE
				; CHECK-LABEL: @extract_elt0_image_sample_1d_v4f32_f32_lwe(
				; CHECK-NEXT: %data = call { <4 x float>, i32 } @llvm.amdgcn.image.sample.1d.sl_v4f32i32s.f32(i32 15, float %vaddr, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 2, i32 0)
				; CHECK: ret float %elt0
				define amdgpu_ps float @extract_elt0_image_sample_1d_v4f32_f32_lwe(float %vaddr, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {
				%data = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.sl_v4f32i32s.f32(i32 15, float %vaddr, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 2, i32 0)
				%data.vec = extractvalue {<4 x float>,i32} %data, 0
				%elt0 = extractelement <4 x float> %data.vec, i32 0
				ret float %elt0
				}

	; CHECK-LABEL: @extract_elt0_image_sample_2d_v4f32_f32(			; CHECK-LABEL: @extract_elt0_image_sample_2d_v4f32_f32(
	; CHECK-NEXT: %data = call float @llvm.amdgcn.image.sample.2d.f32.f32(i32 1, float %s, float %t, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)			; CHECK-NEXT: %data = call float @llvm.amdgcn.image.sample.2d.f32.f32(i32 1, float %s, float %t, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)
	; CHECK-NEXT: ret float %data			; CHECK-NEXT: ret float %data
	define amdgpu_ps float @extract_elt0_image_sample_2d_v4f32_f32(float %s, float %t, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {			define amdgpu_ps float @extract_elt0_image_sample_2d_v4f32_f32(float %s, float %t, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {
	%data = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float %s, float %t, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)			%data = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float %s, float %t, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)
	%elt0 = extractelement <4 x float> %data, i32 0			%elt0 = extractelement <4 x float> %data, i32 0
	ret float %elt0			ret float %elt0
	}			}
	▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <3 x float> %shuf			; CHECK-NEXT: ret <3 x float> %shuf
	define amdgpu_ps <3 x float> @extract_elt0_elt1_elt2_dmask_1111_image_sample_1d_v4f32_f32(float %s, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {			define amdgpu_ps <3 x float> @extract_elt0_elt1_elt2_dmask_1111_image_sample_1d_v4f32_f32(float %s, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {
	%data = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)			%data = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)
	%shuf = shufflevector <4 x float> %data, <4 x float> undef, <3 x i32> <i32 0, i32 1, i32 2>			%shuf = shufflevector <4 x float> %data, <4 x float> undef, <3 x i32> <i32 0, i32 1, i32 2>
	ret <3 x float> %shuf			ret <3 x float> %shuf
	}			}

	declare <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
				declare {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.sl_v4f32i32s.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.image.sample.cl			; llvm.amdgcn.image.sample.cl
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	▲ Show 20 Lines • Show All 904 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add support for TFE/LWE in image intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 175869

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/trunk/lib/Target/AMDGPU/AMDGPU.h

llvm/trunk/lib/Target/AMDGPU/AMDGPU.td

llvm/trunk/lib/Target/AMDGPU/AMDGPUSubtarget.h

llvm/trunk/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/trunk/lib/Target/AMDGPU/CMakeLists.txt

llvm/trunk/lib/Target/AMDGPU/MIMGInstructions.td

llvm/trunk/lib/Target/AMDGPU/SIAddIMGInit.cpp

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/trunk/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.image.load.a16.d16.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.image.load.a16.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.d16.dim.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll

llvm/trunk/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll

[AMDGPU] Add support for TFE/LWE in image intrinsics
ClosedPublic