This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Make i16 a legal type for VI subtargets
ClosedPublic

Authored by • tstellarAMD on Mar 10 2016, 9:23 AM.

Download Raw Diff

Details

Reviewers

cfang
arsenm
wdng

Commits

rG115a61560e24: AMDGPU: Add VI i16 support
rG2b3379cdffaa: AMDGPU: Add VI i16 support
rL286464: AMDGPU: Add VI i16 support
rL285939: AMDGPU: Add VI i16 support

Summary

Patch by: Wei Ding

Diff Detail

Build Status

Buildable 836
Build 836: arc lint + arc unit

Event Timeline

wdng updated this revision to Diff 50288.Mar 10 2016, 9:23 AM

wdng retitled this revision from to AMDGPU i16 implementation .

wdng updated this object.

wdng added reviewers: arsenm, • tstellarAMD, cfang.

wdng set the repository for this revision to rL LLVM.

wdng added a project: Restricted Project.

wdng added a subscriber: wdng.

Herald added a subscriber: arsenm. · View Herald TranscriptMar 10 2016, 9:23 AM

All of these tests should go in the existing test files and be run on the other subtargets as well.

I also expected way more load and store test additions. We need to have extload tests for i8->i16, i16->i32, i16->i64, some of which probably exist already.

This patch should limit itself to the basic set of required operations on i16: load and store, add, sub, bitshifts, and conversions. Optimizations like min/max should come later.

lib/Target/AMDGPU/VIInstructions.td
170	This pattern is necessary. I believe I had a test for this in my original patch
178–195	Dead code should be removed
199	Should follow camel case naming convention
201	No spaces around the :s
204–205	Should be indented like other Pats in the file
210	This is incorrect if this is a scalar zext, which currently doesn't happen because there are no scalar i16 instructions (although we may want pseudos for these). To be consistent, this should use S_MOV_B32 to materialize the 0
240–241	These should be using the signed min/max. I also think the min/max matching should be a separate patch
246	These should be done in a separate patch
249–267	Dead code should be removed. These instruction's dont exist. However, tests should be added to the rotl/rotr/bswap/ctlz/cttz to make sure these are properly expanded when i16 is added as legal since most operations by default are assumed to be legal if the type is

This is also missing the required register info changes and SIISelLowering changes

arsenm edited edge metadata.Mar 12 2016, 5:57 PM

arsenm added a subscriber: llvm-commits.

global-extload-i16.ll contains extload tests for i16->i32, i16->i64. Test for load/store i8->i16 is added into file global-extload-i8.ll
Dead codes have been removed.
Fixed indentation issues.
Fixed scalar zext, used S_MOV_B32 to materialize the 0
Added tests for shl, sra, sub
Removed min/max, which will be committed into another separate patch.

wdng updated this revision to Diff 51139.Mar 20 2016, 8:53 PM

wdng updated this object.

• tstellarAMD added inline comments.Mar 21 2016, 7:30 AM

lib/Target/AMDGPU/SIISelLowering.cpp
264–268	Are these really necessary? I thought making a type legal marked these operations legal by default.
270–276	Same with these too. Are they really necessary?
1743–1756	Why does i16 needs special handling here. These seems to be nearly identical to the block of code directly below.
2022–2031	We do we need to custom lower i16 stores? Can't we just mark then as promote?

arsenm added inline comments.Mar 23 2016, 9:44 AM

lib/Target/AMDGPU/SIISelLowering.cpp
270–276	min/max need to be explicitly made legal, but that should be a separate patch. setcc should be promote for now until those are added later

arsenm added inline comments.Mar 23 2016, 9:45 AM

lib/Target/AMDGPU/SIISelLowering.cpp
65	I did this already, so this should check Subtarget->has16BitInsts()

arsenm added inline comments.Mar 23 2016, 10:09 AM

lib/Target/AMDGPU/SIISelLowering.cpp
2022–2031	Load/store promote expects an equal size type for a bitcast promote. This is the same problem that i1 has, so it should follow that example

Please temporarily ignore code changes in function LowerFDV32(). I will finish the code development for fp32 div in next patch.

Can you send a new patch without the FP32 Div change, so I can commit it.

Revert DFIV32 code changes. This patch contains i16 implementation only based on Tom / Matt comments.

I tried to apply this patch, but I get a lot of failing lit tests in test/CodeGen/AMDGPU. Do all the tests in this directory pass for you?

lib/Target/AMDGPU/SIISelLowering.cpp
81	Do we still need this comment?
2612–2613	Does this need to be removed?
2620	Coding style. Variable names should start with a captial.

• tstellarAMD commandeered this revision.Oct 17 2016, 2:11 PM

• tstellarAMD edited reviewers, added: wdng; removed: • tstellarAMD.

Herald added subscribers: tony-tye, yaxunl, nhaehnle, kzhuravl. · View Herald TranscriptOct 17 2016, 2:11 PM

Rebase on top of master.

• tstellarAMD retitled this revision from AMDGPU i16 implementation to AMDGPU/SI: Make i16 a legal type for VI subtargets.Oct 17 2016, 2:13 PM

• tstellarAMD updated this object.

LGTM overall with minor fixes:

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
600	Subtarget->has16BitInsts()
610	Subtarget->has16BitInsts()
2367	Space after //. Capitalize.
2368	Subtarget->has16BitInsts()
lib/Target/AMDGPU/SIISelLowering.cpp
37	Alphabetize
81	I do not think we need this comment
82	Subtarget->has16BitInsts()
227	Subtarget->has16BitInsts()
271	Detabify
273	Detabify
275	Detabify
277	Detabify
278	Remove extra new line

kzhuravl added parent revisions: D25805: [AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FP, D25803: [AMDGPU] Promote udiv/sdiv (i1, i16] operations to i32, D25802: [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32.Oct 19 2016, 5:04 PM

kzhuravl mentioned this in D25805: [AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FP.Oct 21 2016, 3:11 PM

kzhuravl mentioned this in rL284891: [AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FP.Oct 21 2016, 3:19 PM

kzhuravl mentioned this in D25802: [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32.Oct 21 2016, 4:12 PM

kzhuravl added a child revision: D25975: AMDGPU/SI: Make f16 a legal type for VI subtargets.Oct 26 2016, 1:40 AM

arsenm added inline comments.Oct 26 2016, 9:44 AM

lib/Target/AMDGPU/BUFInstructions.td
962–963	Originally I was avoiding these by promoting the load return type and truncating. Is this the same or is there an advantage from known bits types of things knowing it is really a 32-bit extload?
lib/Target/AMDGPU/SIISelLowering.cpp
1929	Why is this part of the patch? This looks unrelated
3439–3440	Define on same line

Address review comments.

• tstellarAMD added inline comments.Oct 27 2016, 1:24 PM

lib/Target/AMDGPU/BUFInstructions.td
962–963	The main problem is that extload must be legal, so it's hard to custom lower all extloads. I attempted to do this, and I think it would require a lot of work in the backend and also the legalizer to make this work. I'm not sure it's worth the effort at this point.

arsenm added inline comments.Oct 27 2016, 1:53 PM

lib/Target/AMDGPU/BUFInstructions.td
960	This can be refined to a has 16-bit predicate
lib/Target/AMDGPU/VOP1Instructions.td
614–618	Commented out code
lib/Target/AMDGPU/VOP2Instructions.td
348	This should only need to be around the actual defs, not the multiclasses
355	I don't think you need to repeat the type in the output here
lib/Target/AMDGPU/VOP3Instructions.td
232	Line wrapping

Address review comments.

LGTM

This revision is now accepted and ready to land.Oct 28 2016, 11:26 AM

kzhuravl mentioned this in rL285716: [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32.Nov 1 2016, 10:59 AM

Closed by commit rL285939: AMDGPU: Add VI i16 support (authored by tstellar). · Explain WhyNov 3 2016, 10:23 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUISelLowering.cpp

27 lines

AMDGPUInstructions.td

6 lines

51 lines

10 lines

6 lines

76 lines

1 line

105 lines

11 lines

37 lines

54 lines

72 lines

31 lines

test/

CodeGen/

AMDGPU/

add.i16.ll

149 lines

anyext.ll

45 lines

bitreverse.ll

3 lines

cgp-bitfield-extract.ll

9 lines

141 lines

1 line

4 lines

8 lines

206 lines

global-extload-i16.ll

302 lines

half.ll

36 lines

llvm.AMDGPU.bfe.u32.ll

12 lines

15 lines

15 lines

20 lines

12 lines

20 lines

20 lines

87 lines

12 lines

42 lines

46 lines

30 lines

40 lines

trunc-bitcast-vector.ll

11 lines

trunc-store-i1.ll

11 lines

zero_extend.ll

47 lines

Diff 76087

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 577 Lines • ▼ Show 20 Lines	bool AMDGPUTargetLowering::aggressivelyPreferBuildVectorSources(EVT VecVT) const {
//		//
// We should probably only do this if all users are extracts only, but this		// We should probably only do this if all users are extracts only, but this
// should be the common case.		// should be the common case.
return true;		return true;
}		}

bool AMDGPUTargetLowering::isTruncateFree(EVT Source, EVT Dest) const {		bool AMDGPUTargetLowering::isTruncateFree(EVT Source, EVT Dest) const {
// Truncate is just accessing a subregister.		// Truncate is just accessing a subregister.
return Dest.bitsLT(Source) && (Dest.getSizeInBits() % 32 == 0);
		unsigned SrcSize = Source.getSizeInBits();
		unsigned DestSize = Dest.getSizeInBits();

		return DestSize < SrcSize && DestSize % 32 == 0 ;
}		}

bool AMDGPUTargetLowering::isTruncateFree(Type Source, Type Dest) const {		bool AMDGPUTargetLowering::isTruncateFree(Type Source, Type Dest) const {
// Truncate is just accessing a subregister.		// Truncate is just accessing a subregister.
return Dest->getPrimitiveSizeInBits() < Source->getPrimitiveSizeInBits() &&
(Dest->getPrimitiveSizeInBits() % 32 == 0);		unsigned SrcSize = Source->getScalarSizeInBits();
		unsigned DestSize = Dest->getScalarSizeInBits();

		if (DestSize== 16 && Subtarget->has16BitInsts())
		return SrcSize >= 32;
		kzhuravlUnsubmitted Done Reply Inline Actions Subtarget->has16BitInsts() kzhuravl: Subtarget->has16BitInsts()

		return DestSize < SrcSize && DestSize % 32 == 0;
}		}

bool AMDGPUTargetLowering::isZExtFree(Type Src, Type Dest) const {		bool AMDGPUTargetLowering::isZExtFree(Type Src, Type Dest) const {
unsigned SrcSize = Src->getScalarSizeInBits();		unsigned SrcSize = Src->getScalarSizeInBits();
unsigned DestSize = Dest->getScalarSizeInBits();		unsigned DestSize = Dest->getScalarSizeInBits();

		if (SrcSize == 16 && Subtarget->has16BitInsts())
		return DestSize >= 32;
		kzhuravlUnsubmitted Done Reply Inline Actions Subtarget->has16BitInsts() kzhuravl: Subtarget->has16BitInsts()

return SrcSize == 32 && DestSize == 64;		return SrcSize == 32 && DestSize == 64;
}		}

bool AMDGPUTargetLowering::isZExtFree(EVT Src, EVT Dest) const {		bool AMDGPUTargetLowering::isZExtFree(EVT Src, EVT Dest) const {
// Any register load of a 64-bit value really requires 2 32-bit moves. For all		// Any register load of a 64-bit value really requires 2 32-bit moves. For all
// practical purposes, the extra mov 0 to load a 64-bit is free. As used,		// practical purposes, the extra mov 0 to load a 64-bit is free. As used,
// this will enable reducing 64-bit operations the 32-bit, which is always		// this will enable reducing 64-bit operations the 32-bit, which is always
// good.		// good.

		if (Src == MVT::i16)
		return Dest == MVT::i32 \|\|Dest == MVT::i64 ;

return Src == MVT::i32 && Dest == MVT::i64;		return Src == MVT::i32 && Dest == MVT::i64;
}		}

bool AMDGPUTargetLowering::isZExtFree(SDValue Val, EVT VT2) const {		bool AMDGPUTargetLowering::isZExtFree(SDValue Val, EVT VT2) const {
return isZExtFree(Val.getValueType(), VT2);		return isZExtFree(Val.getValueType(), VT2);
}		}

bool AMDGPUTargetLowering::isNarrowingProfitable(EVT SrcVT, EVT DestVT) const {		bool AMDGPUTargetLowering::isNarrowingProfitable(EVT SrcVT, EVT DestVT) const {
▲ Show 20 Lines • Show All 1,727 Lines • ▼ Show 20 Lines
SDValue AMDGPUTargetLowering::performMulCombine(SDNode *N,		SDValue AMDGPUTargetLowering::performMulCombine(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

unsigned Size = VT.getSizeInBits();		unsigned Size = VT.getSizeInBits();
if (VT.isVector() \|\| Size > 64)		if (VT.isVector() \|\| Size > 64)
return SDValue();		return SDValue();

		// There are i16 integer mul/mad.
		kzhuravlUnsubmitted Done Reply Inline Actions Space after //. Capitalize. kzhuravl: Space after //. Capitalize.
		if (Subtarget->has16BitInsts() && VT.getScalarType().bitsLE(MVT::i16))
		kzhuravlUnsubmitted Done Reply Inline Actions Subtarget->has16BitInsts() kzhuravl: Subtarget->has16BitInsts()
		return SDValue();

SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDLoc DL(N);		SDLoc DL(N);

SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
SDValue Mul;		SDValue Mul;

if (Subtarget->hasMulU24() && isU24(N0, DAG) && isU24(N1, DAG)) {		if (Subtarget->hasMulU24() && isU24(N0, DAG) && isU24(N1, DAG)) {
▲ Show 20 Lines • Show All 611 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUInstructions.td

Show First 20 Lines • Show All 523 Lines • ▼ Show 20 Lines	multiclass BFIPatterns <Instruction BFI_INT,
// z ^ (x & (y ^ z))		// z ^ (x & (y ^ z))
def : Pat <		def : Pat <
(xor i32:$z, (and i32:$x, (xor i32:$y, i32:$z))),		(xor i32:$z, (and i32:$x, (xor i32:$y, i32:$z))),
(BFI_INT $x, $y, $z)		(BFI_INT $x, $y, $z)
>;		>;

def : Pat <		def : Pat <
(fcopysign f32:$src0, f32:$src1),		(fcopysign f32:$src0, f32:$src1),
(BFI_INT (LoadImm32 0x7fffffff), $src0, $src1)		(BFI_INT (LoadImm32 (i32 0x7fffffff)), $src0, $src1)
>;		>;

def : Pat <		def : Pat <
(f64 (fcopysign f64:$src0, f64:$src1)),		(f64 (fcopysign f64:$src0, f64:$src1)),
(REG_SEQUENCE RC64,		(REG_SEQUENCE RC64,
(i32 (EXTRACT_SUBREG $src0, sub0)), sub0,		(i32 (EXTRACT_SUBREG $src0, sub0)), sub0,
(BFI_INT (LoadImm32 0x7fffffff),		(BFI_INT (LoadImm32 (i32 0x7fffffff)),
(i32 (EXTRACT_SUBREG $src0, sub1)),		(i32 (EXTRACT_SUBREG $src0, sub1)),
(i32 (EXTRACT_SUBREG $src1, sub1))), sub1)		(i32 (EXTRACT_SUBREG $src1, sub1))), sub1)
>;		>;

def : Pat <		def : Pat <
(f64 (fcopysign f64:$src0, f32:$src1)),		(f64 (fcopysign f64:$src0, f32:$src1)),
(REG_SEQUENCE RC64,		(REG_SEQUENCE RC64,
(i32 (EXTRACT_SUBREG $src0, sub0)), sub0,		(i32 (EXTRACT_SUBREG $src0, sub0)), sub0,
(BFI_INT (LoadImm32 0x7fffffff),		(BFI_INT (LoadImm32 (i32 0x7fffffff)),
(i32 (EXTRACT_SUBREG $src0, sub1)),		(i32 (EXTRACT_SUBREG $src0, sub1)),
$src1), sub1)		$src1), sub1)
>;		>;
}		}

// SHA-256 Ma patterns		// SHA-256 Ma patterns

// ((x & z) \| (y & (x \| z))) -> BFI_INT (XOR x, y), z, y		// ((x & z) \| (y & (x \| z))) -> BFI_INT (XOR x, y), z, y
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

lib/Target/AMDGPU/BUFInstructions.td

Show First 20 Lines • Show All 704 Lines • ▼ Show 20 Lines
// MUBUF Patterns		// MUBUF Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let Predicates = [isGCN] in {		let Predicates = [isGCN] in {

// int_SI_vs_load_input		// int_SI_vs_load_input
def : Pat<		def : Pat<
(SIload_input v4i32:$tlst, imm:$attr_offset, i32:$buf_idx_vgpr),		(SIload_input v4i32:$tlst, imm:$attr_offset, i32:$buf_idx_vgpr),
(BUFFER_LOAD_FORMAT_XYZW_IDXEN $buf_idx_vgpr, $tlst, 0, imm:$attr_offset, 0, 0, 0)		(BUFFER_LOAD_FORMAT_XYZW_IDXEN $buf_idx_vgpr, $tlst, (i32 0), imm:$attr_offset, 0, 0, 0)
>;		>;

// Offset in an 32-bit VGPR		// Offset in an 32-bit VGPR
def : Pat <		def : Pat <
(SIload_constant v4i32:$sbase, i32:$voff),		(SIload_constant v4i32:$sbase, i32:$voff),
(BUFFER_LOAD_DWORD_OFFEN $voff, $sbase, 0, 0, 0, 0, 0)		(BUFFER_LOAD_DWORD_OFFEN $voff, $sbase, (i32 0), 0, 0, 0, 0)
>;		>;


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// buffer_load/store_format patterns		// buffer_load/store_format patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

multiclass MUBUF_LoadIntrinsicPat<SDPatternOperator name, ValueType vt,		multiclass MUBUF_LoadIntrinsicPat<SDPatternOperator name, ValueType vt,
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	(EXTRACT_SUBREG
(BUFFER_ATOMIC_CMPSWAP_RTN_BOTHEN		(BUFFER_ATOMIC_CMPSWAP_RTN_BOTHEN
(REG_SEQUENCE VReg_64, $data, sub0, $cmp, sub1),		(REG_SEQUENCE VReg_64, $data, sub0, $cmp, sub1),
(REG_SEQUENCE VReg_64, $vindex, sub0, $voffset, sub1),		(REG_SEQUENCE VReg_64, $vindex, sub0, $voffset, sub1),
$rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc)),		$rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc)),
sub0)		sub0)
>;		>;


class MUBUFLoad_Pattern <MUBUF_Pseudo Instr_ADDR64, ValueType vt,		class MUBUFLoad_PatternADDR64 <MUBUF_Pseudo Instr_ADDR64, ValueType vt,
PatFrag constant_ld> : Pat <		PatFrag constant_ld> : Pat <
(vt (constant_ld (MUBUFAddr64 v4i32:$srsrc, i64:$vaddr, i32:$soffset,		(vt (constant_ld (MUBUFAddr64 v4i32:$srsrc, i64:$vaddr, i32:$soffset,
i16:$offset, i1:$glc, i1:$slc, i1:$tfe))),		i16:$offset, i1:$glc, i1:$slc, i1:$tfe))),
(Instr_ADDR64 $vaddr, $srsrc, $soffset, $offset, $glc, $slc, $tfe)		(Instr_ADDR64 $vaddr, $srsrc, $soffset, $offset, $glc, $slc, $tfe)
>;		>;

multiclass MUBUFLoad_Atomic_Pattern <MUBUF_Pseudo Instr_ADDR64, MUBUF_Pseudo Instr_OFFSET,		multiclass MUBUFLoad_Atomic_Pattern <MUBUF_Pseudo Instr_ADDR64, MUBUF_Pseudo Instr_OFFSET,
ValueType vt, PatFrag atomic_ld> {		ValueType vt, PatFrag atomic_ld> {
def : Pat <		def : Pat <
(vt (atomic_ld (MUBUFAddr64 v4i32:$srsrc, i64:$vaddr, i32:$soffset,		(vt (atomic_ld (MUBUFAddr64 v4i32:$srsrc, i64:$vaddr, i32:$soffset,
i16:$offset, i1:$slc))),		i16:$offset, i1:$slc))),
(Instr_ADDR64 $vaddr, $srsrc, $soffset, $offset, 1, $slc, 0)		(Instr_ADDR64 $vaddr, $srsrc, $soffset, $offset, 1, $slc, 0)
>;		>;

def : Pat <		def : Pat <
(vt (atomic_ld (MUBUFOffsetNoGLC v4i32:$rsrc, i32:$soffset, i16:$offset))),		(vt (atomic_ld (MUBUFOffsetNoGLC v4i32:$rsrc, i32:$soffset, i16:$offset))),
(Instr_OFFSET $rsrc, $soffset, (as_i16imm $offset), 1, 0, 0)		(Instr_OFFSET $rsrc, $soffset, (as_i16imm $offset), 1, 0, 0)
>;		>;
}		}

let Predicates = [isSICI] in {		let Predicates = [isSICI] in {
def : MUBUFLoad_Pattern <BUFFER_LOAD_SBYTE_ADDR64, i32, sextloadi8_constant>;		def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_SBYTE_ADDR64, i32, sextloadi8_constant>;
def : MUBUFLoad_Pattern <BUFFER_LOAD_UBYTE_ADDR64, i32, az_extloadi8_constant>;		def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_UBYTE_ADDR64, i32, az_extloadi8_constant>;
def : MUBUFLoad_Pattern <BUFFER_LOAD_SSHORT_ADDR64, i32, sextloadi16_constant>;		def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_SSHORT_ADDR64, i32, sextloadi16_constant>;
def : MUBUFLoad_Pattern <BUFFER_LOAD_USHORT_ADDR64, i32, az_extloadi16_constant>;		def : MUBUFLoad_PatternADDR64 <BUFFER_LOAD_USHORT_ADDR64, i32, az_extloadi16_constant>;

defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORD_ADDR64, BUFFER_LOAD_DWORD_OFFSET, i32, mubuf_load_atomic>;		defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORD_ADDR64, BUFFER_LOAD_DWORD_OFFSET, i32, mubuf_load_atomic>;
defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORDX2_ADDR64, BUFFER_LOAD_DWORDX2_OFFSET, i64, mubuf_load_atomic>;		defm : MUBUFLoad_Atomic_Pattern <BUFFER_LOAD_DWORDX2_ADDR64, BUFFER_LOAD_DWORDX2_OFFSET, i64, mubuf_load_atomic>;
} // End Predicates = [isSICI]		} // End Predicates = [isSICI]

		multiclass MUBUFLoad_Pattern <MUBUF_Pseudo Instr_OFFSET, ValueType vt,
		PatFrag ld> {

		def : Pat <
		(vt (ld (MUBUFOffset v4i32:$srsrc, i32:$soffset,
		i16:$offset, i1:$glc, i1:$slc, i1:$tfe))),
		(Instr_OFFSET $srsrc, $soffset, $offset, $glc, $slc, $tfe)
		>;
		}

		let Predicates = [isVI] in {
		arsenmUnsubmitted Done Reply Inline Actions This can be refined to a has 16-bit predicate arsenm: This can be refined to a has 16-bit predicate

		defm : MUBUFLoad_Pattern <BUFFER_LOAD_SBYTE_OFFSET, i16, sextloadi8_constant>;
		defm : MUBUFLoad_Pattern <BUFFER_LOAD_UBYTE_OFFSET, i16, az_extloadi8_constant>;
		arsenmUnsubmitted Not Done Reply Inline Actions Originally I was avoiding these by promoting the load return type and truncating. Is this the same or is there an advantage from known bits types of things knowing it is really a 32-bit extload? arsenm: Originally I was avoiding these by promoting the load return type and truncating. Is this the…
		tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions The main problem is that extload must be legal, so it's hard to custom lower all extloads. I attempted to do this, and I think it would require a lot of work in the backend and also the legalizer to make this work. I'm not sure it's worth the effort at this point. tstellarAMD: The main problem is that extload must be legal, so it's hard to custom lower all extloads. I…
		defm : MUBUFLoad_Pattern <BUFFER_LOAD_SBYTE_OFFSET, i16, mubuf_sextloadi8>;
		defm : MUBUFLoad_Pattern <BUFFER_LOAD_UBYTE_OFFSET, i16, mubuf_az_extloadi8>;

		} // End Predicates = [isVI]

class MUBUFScratchLoadPat <MUBUF_Pseudo Instr, ValueType vt, PatFrag ld> : Pat <		class MUBUFScratchLoadPat <MUBUF_Pseudo Instr, ValueType vt, PatFrag ld> : Pat <
(vt (ld (MUBUFScratch v4i32:$srsrc, i32:$vaddr,		(vt (ld (MUBUFScratch v4i32:$srsrc, i32:$vaddr,
i32:$soffset, u16imm:$offset))),		i32:$soffset, u16imm:$offset))),
(Instr $vaddr, $srsrc, $soffset, $offset, 0, 0, 0)		(Instr $vaddr, $srsrc, $soffset, $offset, 0, 0, 0)
>;		>;

def : MUBUFScratchLoadPat <BUFFER_LOAD_SBYTE_OFFEN, i32, sextloadi8_private>;		def : MUBUFScratchLoadPat <BUFFER_LOAD_SBYTE_OFFEN, i32, sextloadi8_private>;
def : MUBUFScratchLoadPat <BUFFER_LOAD_UBYTE_OFFEN, i32, extloadi8_private>;		def : MUBUFScratchLoadPat <BUFFER_LOAD_UBYTE_OFFEN, i32, extloadi8_private>;
		def : MUBUFScratchLoadPat <BUFFER_LOAD_SBYTE_OFFEN, i16, sextloadi8_private>;
		def : MUBUFScratchLoadPat <BUFFER_LOAD_UBYTE_OFFEN, i16, extloadi8_private>;
def : MUBUFScratchLoadPat <BUFFER_LOAD_SSHORT_OFFEN, i32, sextloadi16_private>;		def : MUBUFScratchLoadPat <BUFFER_LOAD_SSHORT_OFFEN, i32, sextloadi16_private>;
def : MUBUFScratchLoadPat <BUFFER_LOAD_USHORT_OFFEN, i32, extloadi16_private>;		def : MUBUFScratchLoadPat <BUFFER_LOAD_USHORT_OFFEN, i32, extloadi16_private>;
def : MUBUFScratchLoadPat <BUFFER_LOAD_DWORD_OFFEN, i32, load_private>;		def : MUBUFScratchLoadPat <BUFFER_LOAD_DWORD_OFFEN, i32, load_private>;
def : MUBUFScratchLoadPat <BUFFER_LOAD_DWORDX2_OFFEN, v2i32, load_private>;		def : MUBUFScratchLoadPat <BUFFER_LOAD_DWORDX2_OFFEN, v2i32, load_private>;
def : MUBUFScratchLoadPat <BUFFER_LOAD_DWORDX4_OFFEN, v4i32, load_private>;		def : MUBUFScratchLoadPat <BUFFER_LOAD_DWORDX4_OFFEN, v4i32, load_private>;

// BUFFER_LOAD_DWORD*, addr64=0		// BUFFER_LOAD_DWORD*, addr64=0
multiclass MUBUF_Load_Dword <ValueType vt,		multiclass MUBUF_Load_Dword <ValueType vt,
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	def : Pat <
(Instr_OFFSET $val, $rsrc, $soffset, (as_i16imm $offset), 1, 0, 0)		(Instr_OFFSET $val, $rsrc, $soffset, (as_i16imm $offset), 1, 0, 0)
>;		>;
}		}
let Predicates = [isSICI] in {		let Predicates = [isSICI] in {
defm : MUBUFStore_Atomic_Pattern <BUFFER_STORE_DWORD_ADDR64, BUFFER_STORE_DWORD_OFFSET, i32, global_store_atomic>;		defm : MUBUFStore_Atomic_Pattern <BUFFER_STORE_DWORD_ADDR64, BUFFER_STORE_DWORD_OFFSET, i32, global_store_atomic>;
defm : MUBUFStore_Atomic_Pattern <BUFFER_STORE_DWORDX2_ADDR64, BUFFER_STORE_DWORDX2_OFFSET, i64, global_store_atomic>;		defm : MUBUFStore_Atomic_Pattern <BUFFER_STORE_DWORDX2_ADDR64, BUFFER_STORE_DWORDX2_OFFSET, i64, global_store_atomic>;
} // End Predicates = [isSICI]		} // End Predicates = [isSICI]


		multiclass MUBUFStore_Pattern <MUBUF_Pseudo Instr_OFFSET, ValueType vt,
		PatFrag st> {

		def : Pat <
		(st vt:$vdata, (MUBUFOffset v4i32:$srsrc, i32:$soffset,
		i16:$offset, i1:$glc, i1:$slc, i1:$tfe)),
		(Instr_OFFSET $vdata, $srsrc, $soffset, $offset, $glc, $slc, $tfe)
		>;
		}

		defm : MUBUFStore_Pattern <BUFFER_STORE_BYTE_OFFSET, i16, truncstorei8_global>;
		defm : MUBUFStore_Pattern <BUFFER_STORE_SHORT_OFFSET, i16, global_store>;

class MUBUFScratchStorePat <MUBUF_Pseudo Instr, ValueType vt, PatFrag st> : Pat <		class MUBUFScratchStorePat <MUBUF_Pseudo Instr, ValueType vt, PatFrag st> : Pat <
(st vt:$value, (MUBUFScratch v4i32:$srsrc, i32:$vaddr, i32:$soffset,		(st vt:$value, (MUBUFScratch v4i32:$srsrc, i32:$vaddr, i32:$soffset,
u16imm:$offset)),		u16imm:$offset)),
(Instr $value, $vaddr, $srsrc, $soffset, $offset, 0, 0, 0)		(Instr $value, $vaddr, $srsrc, $soffset, $offset, 0, 0, 0)
>;		>;

def : MUBUFScratchStorePat <BUFFER_STORE_BYTE_OFFEN, i32, truncstorei8_private>;		def : MUBUFScratchStorePat <BUFFER_STORE_BYTE_OFFEN, i32, truncstorei8_private>;
def : MUBUFScratchStorePat <BUFFER_STORE_SHORT_OFFEN, i32, truncstorei16_private>;		def : MUBUFScratchStorePat <BUFFER_STORE_SHORT_OFFEN, i32, truncstorei16_private>;
		def : MUBUFScratchStorePat <BUFFER_STORE_BYTE_OFFEN, i16, truncstorei8_private>;
		def : MUBUFScratchStorePat <BUFFER_STORE_SHORT_OFFEN, i16, store_private>;
def : MUBUFScratchStorePat <BUFFER_STORE_DWORD_OFFEN, i32, store_private>;		def : MUBUFScratchStorePat <BUFFER_STORE_DWORD_OFFEN, i32, store_private>;
def : MUBUFScratchStorePat <BUFFER_STORE_DWORDX2_OFFEN, v2i32, store_private>;		def : MUBUFScratchStorePat <BUFFER_STORE_DWORDX2_OFFEN, v2i32, store_private>;
def : MUBUFScratchStorePat <BUFFER_STORE_DWORDX4_OFFEN, v4i32, store_private>;		def : MUBUFScratchStorePat <BUFFER_STORE_DWORDX4_OFFEN, v4i32, store_private>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MTBUF Patterns		// MTBUF Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 269 Lines • Show Last 20 Lines

lib/Target/AMDGPU/DSInstructions.td

	Show First 20 Lines • Show All 483 Lines • ▼ Show 20 Lines

	class DSReadPat <DS_Pseudo inst, ValueType vt, PatFrag frag> : Pat <			class DSReadPat <DS_Pseudo inst, ValueType vt, PatFrag frag> : Pat <
	(vt (frag (DS1Addr1Offset i32:$ptr, i32:$offset))),			(vt (frag (DS1Addr1Offset i32:$ptr, i32:$offset))),
	(inst $ptr, (as_i16imm $offset), (i1 0))			(inst $ptr, (as_i16imm $offset), (i1 0))
	>;			>;

	def : DSReadPat <DS_READ_I8, i32, si_sextload_local_i8>;			def : DSReadPat <DS_READ_I8, i32, si_sextload_local_i8>;
	def : DSReadPat <DS_READ_U8, i32, si_az_extload_local_i8>;			def : DSReadPat <DS_READ_U8, i32, si_az_extload_local_i8>;
				def : DSReadPat <DS_READ_I8, i16, si_sextload_local_i8>;
				def : DSReadPat <DS_READ_U8, i16, si_az_extload_local_i8>;
				def : DSReadPat <DS_READ_I16, i32, si_sextload_local_i16>;
	def : DSReadPat <DS_READ_I16, i32, si_sextload_local_i16>;			def : DSReadPat <DS_READ_I16, i32, si_sextload_local_i16>;
	def : DSReadPat <DS_READ_U16, i32, si_az_extload_local_i16>;			def : DSReadPat <DS_READ_U16, i32, si_az_extload_local_i16>;
				def : DSReadPat <DS_READ_U16, i16, si_load_local>;
	def : DSReadPat <DS_READ_B32, i32, si_load_local>;			def : DSReadPat <DS_READ_B32, i32, si_load_local>;

	let AddedComplexity = 100 in {			let AddedComplexity = 100 in {

	def : DSReadPat <DS_READ_B64, v2i32, si_load_local_align8>;			def : DSReadPat <DS_READ_B64, v2i32, si_load_local_align8>;

	} // End AddedComplexity = 100			} // End AddedComplexity = 100

	def : Pat <			def : Pat <
	(v2i32 (si_load_local (DS64Bit4ByteAligned i32:$ptr, i8:$offset0,			(v2i32 (si_load_local (DS64Bit4ByteAligned i32:$ptr, i8:$offset0,
	i8:$offset1))),			i8:$offset1))),
	(DS_READ2_B32 $ptr, $offset0, $offset1, (i1 0))			(DS_READ2_B32 $ptr, $offset0, $offset1, (i1 0))
	>;			>;

	class DSWritePat <DS_Pseudo inst, ValueType vt, PatFrag frag> : Pat <			class DSWritePat <DS_Pseudo inst, ValueType vt, PatFrag frag> : Pat <
	(frag vt:$value, (DS1Addr1Offset i32:$ptr, i32:$offset)),			(frag vt:$value, (DS1Addr1Offset i32:$ptr, i32:$offset)),
	(inst $ptr, $value, (as_i16imm $offset), (i1 0))			(inst $ptr, $value, (as_i16imm $offset), (i1 0))
	>;			>;

	def : DSWritePat <DS_WRITE_B8, i32, si_truncstore_local_i8>;			def : DSWritePat <DS_WRITE_B8, i32, si_truncstore_local_i8>;
	def : DSWritePat <DS_WRITE_B16, i32, si_truncstore_local_i16>;			def : DSWritePat <DS_WRITE_B16, i32, si_truncstore_local_i16>;
				def : DSWritePat <DS_WRITE_B8, i16, si_truncstore_local_i8>;
				def : DSWritePat <DS_WRITE_B16, i16, si_store_local>;
	def : DSWritePat <DS_WRITE_B32, i32, si_store_local>;			def : DSWritePat <DS_WRITE_B32, i32, si_store_local>;

	let AddedComplexity = 100 in {			let AddedComplexity = 100 in {

	def : DSWritePat <DS_WRITE_B64, v2i32, si_store_local_align8>;			def : DSWritePat <DS_WRITE_B64, v2i32, si_store_local_align8>;
	} // End AddedComplexity = 100			} // End AddedComplexity = 100

	def : Pat <			def : Pat <
	(si_store_local v2i32:$value, (DS64Bit4ByteAligned i32:$ptr, i8:$offset0,			(si_store_local v2i32:$value, (DS64Bit4ByteAligned i32:$ptr, i8:$offset0,
	i8:$offset1)),			i8:$offset1)),
	(DS_WRITE2_B32 $ptr, (EXTRACT_SUBREG $value, sub0),			(DS_WRITE2_B32 $ptr, (i32 (EXTRACT_SUBREG $value, sub0)),
	(EXTRACT_SUBREG $value, sub1), $offset0, $offset1,			(i32 (EXTRACT_SUBREG $value, sub1)), $offset0, $offset1,
	(i1 0))			(i1 0))
	>;			>;

	class DSAtomicRetPat<DS_Pseudo inst, ValueType vt, PatFrag frag> : Pat <			class DSAtomicRetPat<DS_Pseudo inst, ValueType vt, PatFrag frag> : Pat <
	(frag (DS1Addr1Offset i32:$ptr, i32:$offset), vt:$value),			(frag (DS1Addr1Offset i32:$ptr, i32:$offset), vt:$value),
	(inst $ptr, $value, (as_i16imm $offset), (i1 0))			(inst $ptr, $value, (as_i16imm $offset), (i1 0))
	>;			>;

	▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines

lib/Target/AMDGPU/FLATInstructions.td

Show First 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	class FlatAtomicPat <FLAT_Pseudo inst, SDPatternOperator node, ValueType vt,
(vt (node i64:$addr, data_vt:$data)),		(vt (node i64:$addr, data_vt:$data)),
(inst $addr, $data, 0, 0)		(inst $addr, $data, 0, 0)
>;		>;

let Predicates = [isCIVI] in {		let Predicates = [isCIVI] in {

def : FlatLoadPat <FLAT_LOAD_UBYTE, flat_az_extloadi8, i32>;		def : FlatLoadPat <FLAT_LOAD_UBYTE, flat_az_extloadi8, i32>;
def : FlatLoadPat <FLAT_LOAD_SBYTE, flat_sextloadi8, i32>;		def : FlatLoadPat <FLAT_LOAD_SBYTE, flat_sextloadi8, i32>;
		def : FlatLoadPat <FLAT_LOAD_UBYTE, flat_az_extloadi8, i16>;
		def : FlatLoadPat <FLAT_LOAD_SBYTE, flat_sextloadi8, i16>;
def : FlatLoadPat <FLAT_LOAD_USHORT, flat_az_extloadi16, i32>;		def : FlatLoadPat <FLAT_LOAD_USHORT, flat_az_extloadi16, i32>;
def : FlatLoadPat <FLAT_LOAD_SSHORT, flat_sextloadi16, i32>;		def : FlatLoadPat <FLAT_LOAD_SSHORT, flat_sextloadi16, i32>;
def : FlatLoadPat <FLAT_LOAD_DWORD, flat_load, i32>;		def : FlatLoadPat <FLAT_LOAD_DWORD, flat_load, i32>;
def : FlatLoadPat <FLAT_LOAD_DWORDX2, flat_load, v2i32>;		def : FlatLoadPat <FLAT_LOAD_DWORDX2, flat_load, v2i32>;
def : FlatLoadPat <FLAT_LOAD_DWORDX4, flat_load, v4i32>;		def : FlatLoadPat <FLAT_LOAD_DWORDX4, flat_load, v4i32>;

def : FlatLoadAtomicPat <FLAT_LOAD_DWORD, atomic_flat_load, i32>;		def : FlatLoadAtomicPat <FLAT_LOAD_DWORD, atomic_flat_load, i32>;
def : FlatLoadAtomicPat <FLAT_LOAD_DWORDX2, atomic_flat_load, i64>;		def : FlatLoadAtomicPat <FLAT_LOAD_DWORDX2, atomic_flat_load, i64>;
Show All 32 Lines
def : FlatAtomicPat <FLAT_ATOMIC_UMIN_X2_RTN, atomic_umin_global, i64>;		def : FlatAtomicPat <FLAT_ATOMIC_UMIN_X2_RTN, atomic_umin_global, i64>;
def : FlatAtomicPat <FLAT_ATOMIC_OR_X2_RTN, atomic_or_global, i64>;		def : FlatAtomicPat <FLAT_ATOMIC_OR_X2_RTN, atomic_or_global, i64>;
def : FlatAtomicPat <FLAT_ATOMIC_SWAP_X2_RTN, atomic_swap_global, i64>;		def : FlatAtomicPat <FLAT_ATOMIC_SWAP_X2_RTN, atomic_swap_global, i64>;
def : FlatAtomicPat <FLAT_ATOMIC_CMPSWAP_X2_RTN, atomic_cmp_swap_global, i64, v2i64>;		def : FlatAtomicPat <FLAT_ATOMIC_CMPSWAP_X2_RTN, atomic_cmp_swap_global, i64, v2i64>;
def : FlatAtomicPat <FLAT_ATOMIC_XOR_X2_RTN, atomic_xor_global, i64>;		def : FlatAtomicPat <FLAT_ATOMIC_XOR_X2_RTN, atomic_xor_global, i64>;

} // End Predicates = [isCIVI]		} // End Predicates = [isCIVI]

		let Predicates = [isVI] in {
		def : FlatStorePat <FLAT_STORE_SHORT, flat_truncstorei8, i16>;
		def : FlatStorePat <FLAT_STORE_SHORT, flat_store, i16>;
		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Target		// Target
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// CI		// CI
▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show All 28 Lines
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/CodeGen/CallingConvLower.h"		#include "llvm/CodeGen/CallingConvLower.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/SelectionDAG.h"		#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/CodeGen/Analysis.h"		#include "llvm/CodeGen/Analysis.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"

		kzhuravlUnsubmitted Done Reply Inline Actions Alphabetize kzhuravl: Alphabetize
using namespace llvm;		using namespace llvm;

static cl::opt<bool> EnableVGPRIndexMode(		static cl::opt<bool> EnableVGPRIndexMode(
"amdgpu-vgpr-index-mode",		"amdgpu-vgpr-index-mode",
cl::desc("Use GPR indexing mode instead of movrel for vector indexing"),		cl::desc("Use GPR indexing mode instead of movrel for vector indexing"),
cl::init(false));		cl::init(false));


Show All 11 Lines	SITargetLowering::SITargetLowering(const TargetMachine &TM,
const SISubtarget &STI)		const SISubtarget &STI)
: AMDGPUTargetLowering(TM, STI) {		: AMDGPUTargetLowering(TM, STI) {
addRegisterClass(MVT::i1, &AMDGPU::VReg_1RegClass);		addRegisterClass(MVT::i1, &AMDGPU::VReg_1RegClass);
addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass);		addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass);

addRegisterClass(MVT::i32, &AMDGPU::SReg_32RegClass);		addRegisterClass(MVT::i32, &AMDGPU::SReg_32RegClass);
addRegisterClass(MVT::f32, &AMDGPU::VGPR_32RegClass);		addRegisterClass(MVT::f32, &AMDGPU::VGPR_32RegClass);

addRegisterClass(MVT::f64, &AMDGPU::VReg_64RegClass);		addRegisterClass(MVT::f64, &AMDGPU::VReg_64RegClass);
		arsenmUnsubmitted Not Done Reply Inline Actions I did this already, so this should check Subtarget->has16BitInsts() arsenm: I did this already, so this should check Subtarget->has16BitInsts()
addRegisterClass(MVT::v2i32, &AMDGPU::SReg_64RegClass);		addRegisterClass(MVT::v2i32, &AMDGPU::SReg_64RegClass);
addRegisterClass(MVT::v2f32, &AMDGPU::VReg_64RegClass);		addRegisterClass(MVT::v2f32, &AMDGPU::VReg_64RegClass);

addRegisterClass(MVT::v2i64, &AMDGPU::SReg_128RegClass);		addRegisterClass(MVT::v2i64, &AMDGPU::SReg_128RegClass);
addRegisterClass(MVT::v2f64, &AMDGPU::SReg_128RegClass);		addRegisterClass(MVT::v2f64, &AMDGPU::SReg_128RegClass);

addRegisterClass(MVT::v4i32, &AMDGPU::SReg_128RegClass);		addRegisterClass(MVT::v4i32, &AMDGPU::SReg_128RegClass);
addRegisterClass(MVT::v4f32, &AMDGPU::VReg_128RegClass);		addRegisterClass(MVT::v4f32, &AMDGPU::VReg_128RegClass);

addRegisterClass(MVT::v8i32, &AMDGPU::SReg_256RegClass);		addRegisterClass(MVT::v8i32, &AMDGPU::SReg_256RegClass);
addRegisterClass(MVT::v8f32, &AMDGPU::VReg_256RegClass);		addRegisterClass(MVT::v8f32, &AMDGPU::VReg_256RegClass);

addRegisterClass(MVT::v16i32, &AMDGPU::SReg_512RegClass);		addRegisterClass(MVT::v16i32, &AMDGPU::SReg_512RegClass);
addRegisterClass(MVT::v16f32, &AMDGPU::VReg_512RegClass);		addRegisterClass(MVT::v16f32, &AMDGPU::VReg_512RegClass);

		if (Subtarget->has16BitInsts())
		tstellarAMDAuthorUnsubmitted Done Reply Inline Actions Do we still need this comment? tstellarAMD: Do we still need this comment?
		kzhuravlUnsubmitted Done Reply Inline Actions I do not think we need this comment kzhuravl: I do not think we need this comment
		addRegisterClass(MVT::i16, &AMDGPU::SReg_32RegClass);
		kzhuravlUnsubmitted Done Reply Inline Actions Subtarget->has16BitInsts() kzhuravl: Subtarget->has16BitInsts()

computeRegisterProperties(STI.getRegisterInfo());		computeRegisterProperties(STI.getRegisterInfo());

// We need to custom lower vector stores from local memory		// We need to custom lower vector stores from local memory
setOperationAction(ISD::LOAD, MVT::v2i32, Custom);		setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
setOperationAction(ISD::LOAD, MVT::v4i32, Custom);		setOperationAction(ISD::LOAD, MVT::v4i32, Custom);
setOperationAction(ISD::LOAD, MVT::v8i32, Custom);		setOperationAction(ISD::LOAD, MVT::v8i32, Custom);
setOperationAction(ISD::LOAD, MVT::v16i32, Custom);		setOperationAction(ISD::LOAD, MVT::v16i32, Custom);
setOperationAction(ISD::LOAD, MVT::i1, Custom);		setOperationAction(ISD::LOAD, MVT::i1, Custom);
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	SITargetLowering::SITargetLowering(const TargetMachine &TM,

setOperationAction(ISD::FFLOOR, MVT::f64, Legal);		setOperationAction(ISD::FFLOOR, MVT::f64, Legal);

setOperationAction(ISD::FSIN, MVT::f32, Custom);		setOperationAction(ISD::FSIN, MVT::f32, Custom);
setOperationAction(ISD::FCOS, MVT::f32, Custom);		setOperationAction(ISD::FCOS, MVT::f32, Custom);
setOperationAction(ISD::FDIV, MVT::f32, Custom);		setOperationAction(ISD::FDIV, MVT::f32, Custom);
setOperationAction(ISD::FDIV, MVT::f64, Custom);		setOperationAction(ISD::FDIV, MVT::f64, Custom);

		if (Subtarget->has16BitInsts()) {
		kzhuravlUnsubmitted Done Reply Inline Actions Subtarget->has16BitInsts() kzhuravl: Subtarget->has16BitInsts()
		setOperationAction(ISD::Constant, MVT::i16, Legal);

		setOperationAction(ISD::SMIN, MVT::i16, Legal);
		setOperationAction(ISD::SMAX, MVT::i16, Legal);

		setOperationAction(ISD::UMIN, MVT::i16, Legal);
		setOperationAction(ISD::UMAX, MVT::i16, Legal);

		setOperationAction(ISD::SETCC, MVT::i16, Promote);
		AddPromotedToType(ISD::SETCC, MVT::i16, MVT::i32);

		setOperationAction(ISD::SIGN_EXTEND, MVT::i16, Promote);
		AddPromotedToType(ISD::SIGN_EXTEND, MVT::i16, MVT::i32);

		setOperationAction(ISD::ROTR, MVT::i16, Promote);
		setOperationAction(ISD::ROTL, MVT::i16, Promote);

		setOperationAction(ISD::SDIV, MVT::i16, Promote);
		setOperationAction(ISD::UDIV, MVT::i16, Promote);
		setOperationAction(ISD::SREM, MVT::i16, Promote);
		setOperationAction(ISD::UREM, MVT::i16, Promote);

		setOperationAction(ISD::BSWAP, MVT::i16, Promote);
		setOperationAction(ISD::BITREVERSE, MVT::i16, Promote);

		setOperationAction(ISD::CTTZ, MVT::i16, Promote);
		setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i16, Promote);
		setOperationAction(ISD::CTLZ, MVT::i16, Promote);
		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i16, Promote);

		setOperationAction(ISD::SELECT_CC, MVT::i16, Expand);

		setOperationAction(ISD::BR_CC, MVT::i16, Expand);

		setOperationAction(ISD::LOAD, MVT::i16, Custom);

		setTruncStoreAction(MVT::i64, MVT::i16, Expand);

		setOperationAction(ISD::UINT_TO_FP, MVT::i16, Promote);
		AddPromotedToType(ISD::UINT_TO_FP, MVT::i16, MVT::i32);
		setOperationAction(ISD::SINT_TO_FP, MVT::i16, Promote);
		tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions Are these really necessary? I thought making a type legal marked these operations legal by default. tstellarAMD: Are these really necessary? I thought making a type legal marked these operations legal by…
		AddPromotedToType(ISD::SINT_TO_FP, MVT::i16, MVT::i32);
		setOperationAction(ISD::FP16_TO_FP, MVT::i16, Promote);
		AddPromotedToType(ISD::FP16_TO_FP, MVT::i16, MVT::i32);
		kzhuravlUnsubmitted Done Reply Inline Actions Detabify kzhuravl: Detabify
		setOperationAction(ISD::FP_TO_FP16, MVT::i16, Promote);
		AddPromotedToType(ISD::FP_TO_FP16, MVT::i16, MVT::i32);
		kzhuravlUnsubmitted Done Reply Inline Actions Detabify kzhuravl: Detabify
		}

		kzhuravlUnsubmitted Done Reply Inline Actions Detabify kzhuravl: Detabify
setTargetDAGCombine(ISD::FADD);		setTargetDAGCombine(ISD::FADD);
		tstellarAMDAuthorUnsubmitted Done Reply Inline Actions Same with these too. Are they really necessary? tstellarAMD: Same with these too. Are they really necessary?
		arsenmUnsubmitted Done Reply Inline Actions min/max need to be explicitly made legal, but that should be a separate patch. setcc should be promote for now until those are added later arsenm: min/max need to be explicitly made legal, but that should be a separate patch. setcc should be…
setTargetDAGCombine(ISD::FSUB);		setTargetDAGCombine(ISD::FSUB);
		kzhuravlUnsubmitted Done Reply Inline Actions Detabify kzhuravl: Detabify
setTargetDAGCombine(ISD::FMINNUM);		setTargetDAGCombine(ISD::FMINNUM);
		kzhuravlUnsubmitted Done Reply Inline Actions Remove extra new line kzhuravl: Remove extra new line
setTargetDAGCombine(ISD::FMAXNUM);		setTargetDAGCombine(ISD::FMAXNUM);
setTargetDAGCombine(ISD::SMIN);		setTargetDAGCombine(ISD::SMIN);
setTargetDAGCombine(ISD::SMAX);		setTargetDAGCombine(ISD::SMAX);
setTargetDAGCombine(ISD::UMIN);		setTargetDAGCombine(ISD::UMIN);
setTargetDAGCombine(ISD::UMAX);		setTargetDAGCombine(ISD::UMAX);
setTargetDAGCombine(ISD::SETCC);		setTargetDAGCombine(ISD::SETCC);
setTargetDAGCombine(ISD::AND);		setTargetDAGCombine(ISD::AND);
setTargetDAGCombine(ISD::OR);		setTargetDAGCombine(ISD::OR);
▲ Show 20 Lines • Show All 1,448 Lines • ▼ Show 20 Lines
//		//
// v_fma_f32 takes 4 or 16 cycles depending on the device, so it is profitable		// v_fma_f32 takes 4 or 16 cycles depending on the device, so it is profitable
// only on full rate devices. Normally, we should prefer selecting v_mad_f32		// only on full rate devices. Normally, we should prefer selecting v_mad_f32
// which we can always do even without fused FP ops since it returns the same		// which we can always do even without fused FP ops since it returns the same
// result as the separate operations and since it is always full		// result as the separate operations and since it is always full
// rate. Therefore, we lie and report that it is not faster for f32. v_mad_f32		// rate. Therefore, we lie and report that it is not faster for f32. v_mad_f32
// however does not support denormals, so we do report fma as faster if we have		// however does not support denormals, so we do report fma as faster if we have
// a fast fma device and require denormals.		// a fast fma device and require denormals.
//		//
bool SITargetLowering::isFMAFasterThanFMulAndFAdd(EVT VT) const {		bool SITargetLowering::isFMAFasterThanFMulAndFAdd(EVT VT) const {
VT = VT.getScalarType();		VT = VT.getScalarType();

if (!VT.isSimple())		if (!VT.isSimple())
return false;		return false;

switch (VT.getSimpleVT().SimpleTy) {		switch (VT.getSimpleVT().SimpleTy) {
case MVT::f32:		case MVT::f32:
// This is as fast on some subtargets. However, we always have full rate f32		// This is as fast on some subtargets. However, we always have full rate f32
// mad available which returns the same result as the separate operations		// mad available which returns the same result as the separate operations
// which we should prefer over fma. We can't use this if we want to support		// which we should prefer over fma. We can't use this if we want to support
// denormals, so only report this in these cases.		// denormals, so only report this in these cases.
return Subtarget->hasFP32Denormals() && Subtarget->hasFastFMAF32();		return Subtarget->hasFP32Denormals() && Subtarget->hasFastFMAF32();
		tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions Why does i16 needs special handling here. These seems to be nearly identical to the block of code directly below. tstellarAMD: Why does i16 needs special handling here. These seems to be nearly identical to the block of…
case MVT::f64:		case MVT::f64:
return true;		return true;
default:		default:
break;		break;
}		}

return false;		return false;
}		}
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	if (!isCFIntrinsic(Intr)) {
return BRCOND;		return BRCOND;
}		}

bool HaveChain = Intr->getOpcode() == ISD::INTRINSIC_VOID \|\|		bool HaveChain = Intr->getOpcode() == ISD::INTRINSIC_VOID \|\|
Intr->getOpcode() == ISD::INTRINSIC_W_CHAIN;		Intr->getOpcode() == ISD::INTRINSIC_W_CHAIN;

assert(!SetCC \|\|		assert(!SetCC \|\|
(SetCC->getConstantOperandVal(1) == 1 &&		(SetCC->getConstantOperandVal(1) == 1 &&
cast<CondCodeSDNode>(SetCC->getOperand(2).getNode())->get() ==		cast<CondCodeSDNode>(SetCC->getOperand(2).getNode())->get() ==
		arsenmUnsubmitted Done Reply Inline Actions Why is this part of the patch? This looks unrelated arsenm: Why is this part of the patch? This looks unrelated
ISD::SETNE));		ISD::SETNE));

// operands of the new intrinsic call		// operands of the new intrinsic call
SmallVector<SDValue, 4> Ops;		SmallVector<SDValue, 4> Ops;
if (HaveChain)		if (HaveChain)
Ops.push_back(BRCOND.getOperand(0));		Ops.push_back(BRCOND.getOperand(0));

Ops.append(Intr->op_begin() + (HaveChain ? 1 : 0), Intr->op_end());		Ops.append(Intr->op_begin() + (HaveChain ? 1 : 0), Intr->op_end());
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	Value V = UndefValue::get(PointerType::get(Type::getInt8Ty(DAG.getContext()),
AMDGPUAS::CONSTANT_ADDRESS));		AMDGPUAS::CONSTANT_ADDRESS));

MachinePointerInfo PtrInfo(V, StructOffset);		MachinePointerInfo PtrInfo(V, StructOffset);
return DAG.getLoad(MVT::i32, SL, QueuePtr.getValue(1), Ptr, PtrInfo,		return DAG.getLoad(MVT::i32, SL, QueuePtr.getValue(1), Ptr, PtrInfo,
MinAlign(64, StructOffset),		MinAlign(64, StructOffset),
MachineMemOperand::MODereferenceable \|		MachineMemOperand::MODereferenceable \|
MachineMemOperand::MOInvariant);		MachineMemOperand::MOInvariant);
}		}

SDValue SITargetLowering::lowerADDRSPACECAST(SDValue Op,		SDValue SITargetLowering::lowerADDRSPACECAST(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc SL(Op);		SDLoc SL(Op);
const AddrSpaceCastSDNode *ASC = cast<AddrSpaceCastSDNode>(Op);		const AddrSpaceCastSDNode *ASC = cast<AddrSpaceCastSDNode>(Op);

SDValue Src = ASC->getOperand(0);		SDValue Src = ASC->getOperand(0);

// FIXME: Really support non-0 null pointers.		// FIXME: Really support non-0 null pointers.
SDValue SegmentNullPtr = DAG.getConstant(-1, SL, MVT::i32);		SDValue SegmentNullPtr = DAG.getConstant(-1, SL, MVT::i32);
		tstellarAMDAuthorUnsubmitted Done Reply Inline Actions We do we need to custom lower i16 stores? Can't we just mark then as promote? tstellarAMD: We do we need to custom lower i16 stores? Can't we just mark then as promote?
		arsenmUnsubmitted Done Reply Inline Actions Load/store promote expects an equal size type for a bitcast promote. This is the same problem that i1 has, so it should follow that example arsenm: Load/store promote expects an equal size type for a bitcast promote. This is the same problem…
SDValue FlatNullPtr = DAG.getConstant(0, SL, MVT::i64);		SDValue FlatNullPtr = DAG.getConstant(0, SL, MVT::i64);

// flat -> local/private		// flat -> local/private
if (ASC->getSrcAddressSpace() == AMDGPUAS::FLAT_ADDRESS) {		if (ASC->getSrcAddressSpace() == AMDGPUAS::FLAT_ADDRESS) {
if (ASC->getDestAddressSpace() == AMDGPUAS::LOCAL_ADDRESS \|\|		if (ASC->getDestAddressSpace() == AMDGPUAS::LOCAL_ADDRESS \|\|
ASC->getDestAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS) {		ASC->getDestAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS) {
SDValue NonNull = DAG.getSetCC(SL, MVT::i1, Src, FlatNullPtr, ISD::SETNE);		SDValue NonNull = DAG.getSetCC(SL, MVT::i1, Src, FlatNullPtr, ISD::SETNE);
SDValue Ptr = DAG.getNode(ISD::TRUNCATE, SL, MVT::i32, Src);		SDValue Ptr = DAG.getNode(ISD::TRUNCATE, SL, MVT::i32, Src);
▲ Show 20 Lines • Show All 564 Lines • ▼ Show 20 Lines
}		}

SDValue SITargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {		SDValue SITargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
LoadSDNode *Load = cast<LoadSDNode>(Op);		LoadSDNode *Load = cast<LoadSDNode>(Op);
ISD::LoadExtType ExtType = Load->getExtensionType();		ISD::LoadExtType ExtType = Load->getExtensionType();
EVT MemVT = Load->getMemoryVT();		EVT MemVT = Load->getMemoryVT();

if (ExtType == ISD::NON_EXTLOAD && MemVT.getSizeInBits() < 32) {		if (ExtType == ISD::NON_EXTLOAD && MemVT.getSizeInBits() < 32) {
assert(MemVT == MVT::i1 && "Only i1 non-extloads expected");
// FIXME: Copied from PPC		// FIXME: Copied from PPC
		tstellarAMDAuthorUnsubmitted Done Reply Inline Actions Does this need to be removed? tstellarAMD: Does this need to be removed?
// First, load into 32 bits, then truncate to 1 bit.		// First, load into 32 bits, then truncate to 1 bit.

SDValue Chain = Load->getChain();		SDValue Chain = Load->getChain();
SDValue BasePtr = Load->getBasePtr();		SDValue BasePtr = Load->getBasePtr();
MachineMemOperand *MMO = Load->getMemOperand();		MachineMemOperand *MMO = Load->getMemOperand();

		EVT RealMemVT = (MemVT == MVT::i1) ? MVT::i8 : MVT::i16;
		tstellarAMDAuthorUnsubmitted Done Reply Inline Actions Coding style. Variable names should start with a captial. tstellarAMD: Coding style. Variable names should start with a captial.

SDValue NewLD = DAG.getExtLoad(ISD::EXTLOAD, DL, MVT::i32, Chain,		SDValue NewLD = DAG.getExtLoad(ISD::EXTLOAD, DL, MVT::i32, Chain,
BasePtr, MVT::i8, MMO);		BasePtr, RealMemVT, MMO);

SDValue Ops[] = {		SDValue Ops[] = {
DAG.getNode(ISD::TRUNCATE, DL, MemVT, NewLD),		DAG.getNode(ISD::TRUNCATE, DL, MemVT, NewLD),
NewLD.getValue(1)		NewLD.getValue(1)
};		};

return DAG.getMergeValues(Ops, DL);		return DAG.getMergeValues(Ops, DL);
}		}
▲ Show 20 Lines • Show All 797 Lines • ▼ Show 20 Lines	if (Signed) {
if (K0->getAPIntValue().sge(K1->getAPIntValue()))		if (K0->getAPIntValue().sge(K1->getAPIntValue()))
return SDValue();		return SDValue();
} else {		} else {
if (K0->getAPIntValue().uge(K1->getAPIntValue()))		if (K0->getAPIntValue().uge(K1->getAPIntValue()))
return SDValue();		return SDValue();
}		}

EVT VT = K0->getValueType(0);		EVT VT = K0->getValueType(0);

		MVT NVT = MVT::i32;
		unsigned ExtOp = Signed ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;

		arsenmUnsubmitted Done Reply Inline Actions Define on same line arsenm: Define on same line
		SDValue Tmp1, Tmp2, Tmp3;
		Tmp1 = DAG.getNode(ExtOp, SL, NVT, Op0->getOperand(0));
		Tmp2 = DAG.getNode(ExtOp, SL, NVT, Op0->getOperand(1));
		Tmp3 = DAG.getNode(ExtOp, SL, NVT, Op1);

		if (VT == MVT::i16) {
		Tmp1 = DAG.getNode(Signed ? AMDGPUISD::SMED3 : AMDGPUISD::UMED3, SL, NVT,
		Tmp1, Tmp2, Tmp3);

		return DAG.getNode(ISD::TRUNCATE, SL, VT, Tmp1);
		} else
return DAG.getNode(Signed ? AMDGPUISD::SMED3 : AMDGPUISD::UMED3, SL, VT,		return DAG.getNode(Signed ? AMDGPUISD::SMED3 : AMDGPUISD::UMED3, SL, VT,
Op0.getOperand(0), SDValue(K0, 0), SDValue(K1, 0));		Op0.getOperand(0), SDValue(K0, 0), SDValue(K1, 0));
}		}

static bool isKnownNeverSNan(SelectionDAG &DAG, SDValue Op) {		static bool isKnownNeverSNan(SelectionDAG &DAG, SDValue Op) {
if (!DAG.getTargetLoweringInfo().hasFloatingPointExceptions())		if (!DAG.getTargetLoweringInfo().hasFloatingPointExceptions())
return true;		return true;

return DAG.isKnownNeverNaN(Op);		return DAG.isKnownNeverNaN(Op);
}		}
▲ Show 20 Lines • Show All 704 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.td

Show First 20 Lines • Show All 1,116 Lines • ▼ Show 20 Lines	def getAtomicNoRetOp : InstrMapping {
let RowFields = ["NoRetOp"];		let RowFields = ["NoRetOp"];
let ColFields = ["IsRet"];		let ColFields = ["IsRet"];
let KeyCol = ["1"];		let KeyCol = ["1"];
let ValueCols = [["0"]];		let ValueCols = [["0"]];
}		}

include "SIInstructions.td"		include "SIInstructions.td"
include "CIInstructions.td"		include "CIInstructions.td"
include "VIInstructions.td"

include "DSInstructions.td"		include "DSInstructions.td"
include "MIMGInstructions.td"		include "MIMGInstructions.td"

lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 368 Lines • ▼ Show 20 Lines

def : Pat<		def : Pat<
(int_amdgcn_else i64:$src, bb:$target),		(int_amdgcn_else i64:$src, bb:$target),
(SI_ELSE $src, $target, 0)		(SI_ELSE $src, $target, 0)
>;		>;

def : Pat <		def : Pat <
(int_AMDGPU_kilp),		(int_AMDGPU_kilp),
(SI_KILL 0xbf800000)		(SI_KILL (i32 0xbf800000))
>;		>;

def : Pat <		def : Pat <
(int_SI_export imm:$en, imm:$vm, imm:$done, imm:$tgt, imm:$compr,		(int_SI_export imm:$en, imm:$vm, imm:$done, imm:$tgt, imm:$compr,
f32:$src0, f32:$src1, f32:$src2, f32:$src3),		f32:$src0, f32:$src1, f32:$src2, f32:$src3),
(EXP imm:$en, imm:$tgt, imm:$compr, imm:$done, imm:$vm,		(EXP imm:$en, imm:$tgt, imm:$compr, imm:$done, imm:$vm,
$src0, $src1, $src2, $src3)		$src0, $src1, $src2, $src3)
>;		>;
▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines

/******** =================== ********/		/******** =================== ********/
/******** Src & Dst modifiers ********/		/******** Src & Dst modifiers ********/
/******** =================== ********/		/******** =================== ********/

def : Pat <		def : Pat <
(AMDGPUclamp (VOP3Mods0Clamp f32:$src0, i32:$src0_modifiers, i32:$omod),		(AMDGPUclamp (VOP3Mods0Clamp f32:$src0, i32:$src0_modifiers, i32:$omod),
(f32 FP_ZERO), (f32 FP_ONE)),		(f32 FP_ZERO), (f32 FP_ONE)),
(V_ADD_F32_e64 $src0_modifiers, $src0, 0, 0, 1, $omod)		(V_ADD_F32_e64 $src0_modifiers, $src0, 0, (i32 0), 1, $omod)
>;		>;

/******** ================================ ********/		/******** ================================ ********/
/******** Floating point absolute/negative ********/		/******** Floating point absolute/negative ********/
/******** ================================ ********/		/******** ================================ ********/

// Prevent expanding both fneg and fabs.		// Prevent expanding both fneg and fabs.

def : Pat <		def : Pat <
(fneg (fabs f32:$src)),		(fneg (fabs f32:$src)),
(S_OR_B32 $src, (S_MOV_B32 0x80000000)) // Set sign bit		(S_OR_B32 $src, (S_MOV_B32(i32 0x80000000))) // Set sign bit
>;		>;

// FIXME: Should use S_OR_B32		// FIXME: Should use S_OR_B32
def : Pat <		def : Pat <
(fneg (fabs f64:$src)),		(fneg (fabs f64:$src)),
(REG_SEQUENCE VReg_64,		(REG_SEQUENCE VReg_64,
(i32 (EXTRACT_SUBREG f64:$src, sub0)),		(i32 (EXTRACT_SUBREG f64:$src, sub0)),
sub0,		sub0,
(V_OR_B32_e32 (EXTRACT_SUBREG f64:$src, sub1),		(V_OR_B32_e32 (i32 (EXTRACT_SUBREG f64:$src, sub1)),
(V_MOV_B32_e32 0x80000000)), // Set sign bit.		(V_MOV_B32_e32 (i32 0x80000000))), // Set sign bit.
sub1)		sub1)
>;		>;

def : Pat <		def : Pat <
(fabs f32:$src),		(fabs f32:$src),
(V_AND_B32_e64 $src, (V_MOV_B32_e32 0x7fffffff))		(V_AND_B32_e64 $src, (V_MOV_B32_e32 (i32 0x7fffffff)))
>;		>;

def : Pat <		def : Pat <
(fneg f32:$src),		(fneg f32:$src),
(V_XOR_B32_e32 $src, (V_MOV_B32_e32 0x80000000))		(V_XOR_B32_e32 $src, (V_MOV_B32_e32 (i32 0x80000000)))
>;		>;

def : Pat <		def : Pat <
(fabs f64:$src),		(fabs f64:$src),
(REG_SEQUENCE VReg_64,		(REG_SEQUENCE VReg_64,
(i32 (EXTRACT_SUBREG f64:$src, sub0)),		(i32 (EXTRACT_SUBREG f64:$src, sub0)),
sub0,		sub0,
(V_AND_B32_e64 (EXTRACT_SUBREG f64:$src, sub1),		(V_AND_B32_e64 (i32 (EXTRACT_SUBREG f64:$src, sub1)),
(V_MOV_B32_e32 0x7fffffff)), // Set sign bit.		(V_MOV_B32_e32 (i32 0x7fffffff))), // Set sign bit.
sub1)		sub1)
>;		>;

def : Pat <		def : Pat <
(fneg f64:$src),		(fneg f64:$src),
(REG_SEQUENCE VReg_64,		(REG_SEQUENCE VReg_64,
(i32 (EXTRACT_SUBREG f64:$src, sub0)),		(i32 (EXTRACT_SUBREG f64:$src, sub0)),
sub0,		sub0,
(V_XOR_B32_e32 (EXTRACT_SUBREG f64:$src, sub1),		(V_XOR_B32_e32 (i32 (EXTRACT_SUBREG f64:$src, sub1)),
(V_MOV_B32_e32 0x80000000)),		(i32 (V_MOV_B32_e32 (i32 0x80000000)))),
sub1)		sub1)
>;		>;

/******** ================== ********/		/******** ================== ********/
/******** Immediate Patterns ********/		/******** Immediate Patterns ********/
/******** ================== ********/		/******** ================== ********/

def : Pat <		def : Pat <
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
/******** Intrinsic Patterns ********/		/******** Intrinsic Patterns ********/
/******** ================== ********/		/******** ================== ********/

def : POW_Common <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_LEGACY_F32_e32>;		def : POW_Common <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_LEGACY_F32_e32>;

def : Pat <		def : Pat <
(int_AMDGPU_cube v4f32:$src),		(int_AMDGPU_cube v4f32:$src),
(REG_SEQUENCE VReg_128,		(REG_SEQUENCE VReg_128,
(V_CUBETC_F32 0 /* src0_modifiers */, (EXTRACT_SUBREG $src, sub0),		(V_CUBETC_F32 0 /* src0_modifiers */, (f32 (EXTRACT_SUBREG $src, sub0)),
0 /* src1_modifiers */, (EXTRACT_SUBREG $src, sub1),		0 /* src1_modifiers */, (f32 (EXTRACT_SUBREG $src, sub1)),
0 /* src2_modifiers */, (EXTRACT_SUBREG $src, sub2),		0 /* src2_modifiers */, (f32 (EXTRACT_SUBREG $src, sub2)),
0 /* clamp /, 0 / omod */), sub0,		0 /* clamp /, 0 / omod */), sub0,
(V_CUBESC_F32 0 /* src0_modifiers */, (EXTRACT_SUBREG $src, sub0),		(V_CUBESC_F32 0 /* src0_modifiers */, (f32 (EXTRACT_SUBREG $src, sub0)),
0 /* src1_modifiers */,(EXTRACT_SUBREG $src, sub1),		0 /* src1_modifiers */,(f32 (EXTRACT_SUBREG $src, sub1)),
0 /* src2_modifiers */,(EXTRACT_SUBREG $src, sub2),		0 /* src2_modifiers */,(f32 (EXTRACT_SUBREG $src, sub2)),
0 /* clamp /, 0 / omod */), sub1,		0 /* clamp /, 0 / omod */), sub1,
(V_CUBEMA_F32 0 /* src1_modifiers */,(EXTRACT_SUBREG $src, sub0),		(V_CUBEMA_F32 0 /* src1_modifiers */,(f32 (EXTRACT_SUBREG $src, sub0)),
0 /* src1_modifiers */,(EXTRACT_SUBREG $src, sub1),		0 /* src1_modifiers */,(f32 (EXTRACT_SUBREG $src, sub1)),
0 /* src1_modifiers */,(EXTRACT_SUBREG $src, sub2),		0 /* src1_modifiers */,(f32 (EXTRACT_SUBREG $src, sub2)),
0 /* clamp /, 0 / omod */), sub2,		0 /* clamp /, 0 / omod */), sub2,
(V_CUBEID_F32 0 /* src1_modifiers */,(EXTRACT_SUBREG $src, sub0),		(V_CUBEID_F32 0 /* src1_modifiers */,(f32 (EXTRACT_SUBREG $src, sub0)),
0 /* src1_modifiers */,(EXTRACT_SUBREG $src, sub1),		0 /* src1_modifiers */,(f32 (EXTRACT_SUBREG $src, sub1)),
0 /* src1_modifiers */,(EXTRACT_SUBREG $src, sub2),		0 /* src1_modifiers */,(f32 (EXTRACT_SUBREG $src, sub2)),
0 /* clamp /, 0 / omod */), sub3)		0 /* clamp /, 0 / omod */), sub3)
>;		>;

def : Pat <		def : Pat <
(i32 (sext i1:$src0)),		(i32 (sext i1:$src0)),
(V_CNDMASK_B32_e64 (i32 0), (i32 -1), $src0)		(V_CNDMASK_B32_e64 (i32 0), (i32 -1), $src0)
>;		>;

class Ext32Pat <SDNode ext> : Pat <		class Ext32Pat <SDNode ext> : Pat <
(i32 (ext i1:$src0)),		(i32 (ext i1:$src0)),
(V_CNDMASK_B32_e64 (i32 0), (i32 1), $src0)		(V_CNDMASK_B32_e64 (i32 0), (i32 1), $src0)
>;		>;

def : Ext32Pat <zext>;		def : Ext32Pat <zext>;
def : Ext32Pat <anyext>;		def : Ext32Pat <anyext>;

// The multiplication scales from [0,1] to the unsigned integer range		// The multiplication scales from [0,1] to the unsigned integer range
def : Pat <		def : Pat <
(AMDGPUurecip i32:$src0),		(AMDGPUurecip i32:$src0),
(V_CVT_U32_F32_e32		(V_CVT_U32_F32_e32
(V_MUL_F32_e32 CONST.FP_UINT_MAX_PLUS_1,		(V_MUL_F32_e32 (i32 CONST.FP_UINT_MAX_PLUS_1),
(V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 $src0))))		(V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 $src0))))
>;		>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// VOP3 Patterns		// VOP3 Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def : IMad24Pat<V_MAD_I32_I24>;		def : IMad24Pat<V_MAD_I32_I24>;
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	def : Pat <
(V_SAD_U32 $src0, $src1, $src2)		(V_SAD_U32 $src0, $src1, $src2)
>;		>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Conversion Patterns		// Conversion Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def : Pat<(i32 (sext_inreg i32:$src, i1)),		def : Pat<(i32 (sext_inreg i32:$src, i1)),
(S_BFE_I32 i32:$src, 65536)>; // 0 \| 1 << 16		(S_BFE_I32 i32:$src, (i32 65536))>; // 0 \| 1 << 16

// Handle sext_inreg in i64		// Handle sext_inreg in i64
def : Pat <		def : Pat <
(i64 (sext_inreg i64:$src, i1)),		(i64 (sext_inreg i64:$src, i1)),
(S_BFE_I64 i64:$src, 0x10000) // 0 \| 1 << 16		(S_BFE_I64 i64:$src, (i32 0x10000)) // 0 \| 1 << 16
		>;

		def : Pat <
		(i16 (sext_inreg i16:$src, i8)),
		(S_BFE_I32 $src, (i32 0x80000)) // 0 \| 8 << 16
>;		>;

def : Pat <		def : Pat <
(i64 (sext_inreg i64:$src, i8)),		(i64 (sext_inreg i64:$src, i8)),
(S_BFE_I64 i64:$src, 0x80000) // 0 \| 8 << 16		(S_BFE_I64 i64:$src, (i32 0x80000)) // 0 \| 8 << 16
>;		>;

def : Pat <		def : Pat <
(i64 (sext_inreg i64:$src, i16)),		(i64 (sext_inreg i64:$src, i16)),
(S_BFE_I64 i64:$src, 0x100000) // 0 \| 16 << 16		(S_BFE_I64 i64:$src, (i32 0x100000)) // 0 \| 16 << 16
>;		>;

def : Pat <		def : Pat <
(i64 (sext_inreg i64:$src, i32)),		(i64 (sext_inreg i64:$src, i32)),
(S_BFE_I64 i64:$src, 0x200000) // 0 \| 32 << 16		(S_BFE_I64 i64:$src, (i32 0x200000)) // 0 \| 32 << 16
>;		>;

def : Pat <		def : Pat <
(i64 (zext i32:$src)),		(i64 (zext i32:$src)),
(REG_SEQUENCE SReg_64, $src, sub0, (S_MOV_B32 0), sub1)		(REG_SEQUENCE SReg_64, $src, sub0, (S_MOV_B32 (i32 0)), sub1)
>;		>;

def : Pat <		def : Pat <
(i64 (anyext i32:$src)),		(i64 (anyext i32:$src)),
(REG_SEQUENCE SReg_64, $src, sub0, (i32 (IMPLICIT_DEF)), sub1)		(REG_SEQUENCE SReg_64, $src, sub0, (i32 (IMPLICIT_DEF)), sub1)
>;		>;

class ZExt_i64_i1_Pat <SDNode ext> : Pat <		class ZExt_i64_i1_Pat <SDNode ext> : Pat <
(i64 (ext i1:$src)),		(i64 (ext i1:$src)),
(REG_SEQUENCE VReg_64,		(REG_SEQUENCE VReg_64,
(V_CNDMASK_B32_e64 (i32 0), (i32 1), $src), sub0,		(V_CNDMASK_B32_e64 (i32 0), (i32 1), $src), sub0,
(S_MOV_B32 0), sub1)		(S_MOV_B32 (i32 0)), sub1)
>;		>;


def : ZExt_i64_i1_Pat<zext>;		def : ZExt_i64_i1_Pat<zext>;
def : ZExt_i64_i1_Pat<anyext>;		def : ZExt_i64_i1_Pat<anyext>;

// FIXME: We need to use COPY_TO_REGCLASS to work-around the fact that		// FIXME: We need to use COPY_TO_REGCLASS to work-around the fact that
// REG_SEQUENCE patterns don't support instructions with multiple outputs.		// REG_SEQUENCE patterns don't support instructions with multiple outputs.
def : Pat <		def : Pat <
(i64 (sext i32:$src)),		(i64 (sext i32:$src)),
(REG_SEQUENCE SReg_64, $src, sub0,		(REG_SEQUENCE SReg_64, $src, sub0,
(i32 (COPY_TO_REGCLASS (S_ASHR_I32 $src, 31), SReg_32_XM0)), sub1)		(i32 (COPY_TO_REGCLASS (S_ASHR_I32 $src, (i32 31)), SReg_32_XM0)), sub1)
>;		>;

def : Pat <		def : Pat <
(i64 (sext i1:$src)),		(i64 (sext i1:$src)),
(REG_SEQUENCE VReg_64,		(REG_SEQUENCE VReg_64,
(V_CNDMASK_B32_e64 0, -1, $src), sub0,		(V_CNDMASK_B32_e64 (i32 0), (i32 -1), $src), sub0,
(V_CNDMASK_B32_e64 0, -1, $src), sub1)		(V_CNDMASK_B32_e64 (i32 0), (i32 -1), $src), sub1)
>;		>;

class FPToI1Pat<Instruction Inst, int KOne, ValueType vt, SDPatternOperator fp_to_int> : Pat <		class FPToI1Pat<Instruction Inst, int KOne, ValueType kone_type, ValueType vt, SDPatternOperator fp_to_int> : Pat <
(i1 (fp_to_int (vt (VOP3Mods vt:$src0, i32:$src0_modifiers)))),		(i1 (fp_to_int (vt (VOP3Mods vt:$src0, i32:$src0_modifiers)))),
(i1 (Inst 0, KOne, $src0_modifiers, $src0, DSTCLAMP.NONE, DSTOMOD.NONE))		(i1 (Inst 0, (kone_type KOne), $src0_modifiers, $src0, DSTCLAMP.NONE, DSTOMOD.NONE))
>;		>;

def : FPToI1Pat<V_CMP_EQ_F32_e64, CONST.FP32_ONE, f32, fp_to_uint>;		def : FPToI1Pat<V_CMP_EQ_F32_e64, CONST.FP32_ONE, i32, f32, fp_to_uint>;
def : FPToI1Pat<V_CMP_EQ_F32_e64, CONST.FP32_NEG_ONE, f32, fp_to_sint>;		def : FPToI1Pat<V_CMP_EQ_F32_e64, CONST.FP32_NEG_ONE, i32, f32, fp_to_sint>;
def : FPToI1Pat<V_CMP_EQ_F64_e64, CONST.FP64_ONE, f64, fp_to_uint>;		def : FPToI1Pat<V_CMP_EQ_F64_e64, CONST.FP64_ONE, i64, f64, fp_to_uint>;
def : FPToI1Pat<V_CMP_EQ_F64_e64, CONST.FP64_NEG_ONE, f64, fp_to_sint>;		def : FPToI1Pat<V_CMP_EQ_F64_e64, CONST.FP64_NEG_ONE, i64, f64, fp_to_sint>;

// If we need to perform a logical operation on i1 values, we need to		// If we need to perform a logical operation on i1 values, we need to
// use vector comparisons since there is only one SCC register. Vector		// use vector comparisons since there is only one SCC register. Vector
// comparisions still write to a pair of SGPRs, so treat these as		// comparisions still write to a pair of SGPRs, so treat these as
// 64-bit comparisons. When legalizing SGPR copies, instructions		// 64-bit comparisons. When legalizing SGPR copies, instructions
// resulting in the copies from SCC to these instructions will be		// resulting in the copies from SCC to these instructions will be
// moved to the VALU.		// moved to the VALU.
def : Pat <		def : Pat <
(i1 (and i1:$src0, i1:$src1)),		(i1 (and i1:$src0, i1:$src1)),
(S_AND_B64 $src0, $src1)		(S_AND_B64 $src0, $src1)
>;		>;

def : Pat <		def : Pat <
(i1 (or i1:$src0, i1:$src1)),		(i1 (or i1:$src0, i1:$src1)),
(S_OR_B64 $src0, $src1)		(S_OR_B64 $src0, $src1)
>;		>;

def : Pat <		def : Pat <
(i1 (xor i1:$src0, i1:$src1)),		(i1 (xor i1:$src0, i1:$src1)),
(S_XOR_B64 $src0, $src1)		(S_XOR_B64 $src0, $src1)
>;		>;

def : Pat <		def : Pat <
(f32 (sint_to_fp i1:$src)),		(f32 (sint_to_fp i1:$src)),
(V_CNDMASK_B32_e64 (i32 0), CONST.FP32_NEG_ONE, $src)		(V_CNDMASK_B32_e64 (i32 0), (i32 CONST.FP32_NEG_ONE), $src)
>;		>;

def : Pat <		def : Pat <
(f32 (uint_to_fp i1:$src)),		(f32 (uint_to_fp i1:$src)),
(V_CNDMASK_B32_e64 (i32 0), CONST.FP32_ONE, $src)		(V_CNDMASK_B32_e64 (i32 0), (i32 CONST.FP32_ONE), $src)
>;		>;

def : Pat <		def : Pat <
(f64 (sint_to_fp i1:$src)),		(f64 (sint_to_fp i1:$src)),
(V_CVT_F64_I32_e32 (V_CNDMASK_B32_e64 (i32 0), (i32 -1), $src))		(V_CVT_F64_I32_e32 (V_CNDMASK_B32_e64 (i32 0), (i32 -1), $src))
>;		>;

def : Pat <		def : Pat <
(f64 (uint_to_fp i1:$src)),		(f64 (uint_to_fp i1:$src)),
(V_CVT_F64_U32_e32 (V_CNDMASK_B32_e64 (i32 0), (i32 1), $src))		(V_CVT_F64_U32_e32 (V_CNDMASK_B32_e64 (i32 0), (i32 1), $src))
>;		>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Miscellaneous Patterns		// Miscellaneous Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def : Pat <		def : Pat <
(i32 (trunc i64:$a)),		(i32 (trunc i64:$a)),
(EXTRACT_SUBREG $a, sub0)		(EXTRACT_SUBREG $a, sub0)
>;		>;

def : Pat <		def : Pat <
(i1 (trunc i32:$a)),		(i1 (trunc i32:$a)),
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), 1)		(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))
>;		>;

def : Pat <		def : Pat <
(i1 (trunc i64:$a)),		(i1 (trunc i64:$a)),
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1),		(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1),
(EXTRACT_SUBREG $a, sub0)), 1)		(i32 (EXTRACT_SUBREG $a, sub0))), (i32 1))
>;		>;

def : Pat <		def : Pat <
(i32 (bswap i32:$a)),		(i32 (bswap i32:$a)),
(V_BFI_B32 (S_MOV_B32 0x00ff00ff),		(V_BFI_B32 (S_MOV_B32 (i32 0x00ff00ff)),
(V_ALIGNBIT_B32 $a, $a, 24),		(V_ALIGNBIT_B32 $a, $a, (i32 24)),
(V_ALIGNBIT_B32 $a, $a, 8))		(V_ALIGNBIT_B32 $a, $a, (i32 8)))
>;		>;

def : Pat <		def : Pat <
(f32 (select i1:$src2, f32:$src1, f32:$src0)),		(f32 (select i1:$src2, f32:$src1, f32:$src0)),
(V_CNDMASK_B32_e64 $src0, $src1, $src2)		(V_CNDMASK_B32_e64 $src0, $src1, $src2)
>;		>;

multiclass BFMPatterns <ValueType vt, InstSI BFM, InstSI MOV> {		multiclass BFMPatterns <ValueType vt, InstSI BFM, InstSI MOV> {
def : Pat <		def : Pat <
(vt (shl (vt (add (vt (shl 1, vt:$a)), -1)), vt:$b)),		(vt (shl (vt (add (vt (shl 1, vt:$a)), -1)), vt:$b)),
(BFM $a, $b)		(BFM $a, $b)
>;		>;

def : Pat <		def : Pat <
(vt (add (vt (shl 1, vt:$a)), -1)),		(vt (add (vt (shl 1, vt:$a)), -1)),
(BFM $a, (MOV 0))		(BFM $a, (MOV (i32 0)))
>;		>;
}		}

defm : BFMPatterns <i32, S_BFM_B32, S_MOV_B32>;		defm : BFMPatterns <i32, S_BFM_B32, S_MOV_B32>;
// FIXME: defm : BFMPatterns <i64, S_BFM_B64, S_MOV_B64>;		// FIXME: defm : BFMPatterns <i64, S_BFM_B64, S_MOV_B64>;

def : BFEPattern <V_BFE_U32, S_MOV_B32>;		def : BFEPattern <V_BFE_U32, S_MOV_B32>;

def : Pat<		def : Pat<
(fcanonicalize f32:$src),		(fcanonicalize f32:$src),
(V_MUL_F32_e64 0, CONST.FP32_ONE, 0, $src, 0, 0)		(V_MUL_F32_e64 0, (i32 CONST.FP32_ONE), 0, $src, 0, 0)
>;		>;

def : Pat<		def : Pat<
(fcanonicalize f64:$src),		(fcanonicalize f64:$src),
(V_MUL_F64 0, CONST.FP64_ONE, 0, $src, 0, 0)		(V_MUL_F64 0, CONST.FP64_ONE, 0, $src, 0, 0)
>;		>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
Show All 18 Lines	(V_ADD_F64
(V_CNDMASK_B64_PSEUDO		(V_CNDMASK_B64_PSEUDO
(V_MIN_F64		(V_MIN_F64
SRCMODS.NONE,		SRCMODS.NONE,
(V_FRACT_F64_e64 $mods, $x, DSTCLAMP.NONE, DSTOMOD.NONE),		(V_FRACT_F64_e64 $mods, $x, DSTCLAMP.NONE, DSTOMOD.NONE),
SRCMODS.NONE,		SRCMODS.NONE,
(V_MOV_B64_PSEUDO 0x3fefffffffffffff),		(V_MOV_B64_PSEUDO 0x3fefffffffffffff),
DSTCLAMP.NONE, DSTOMOD.NONE),		DSTCLAMP.NONE, DSTOMOD.NONE),
$x,		$x,
(V_CMP_CLASS_F64_e64 SRCMODS.NONE, $x, 3/NaN/)),		(V_CMP_CLASS_F64_e64 SRCMODS.NONE, $x, (i32 3 /NaN/))),
DSTCLAMP.NONE, DSTOMOD.NONE)		DSTCLAMP.NONE, DSTOMOD.NONE)
>;		>;

} // End Predicates = [isSI]		} // End Predicates = [isSI]

//============================================================================//		//============================================================================//
// Miscellaneous Optimization Patterns		// Miscellaneous Optimization Patterns
//============================================================================//		//============================================================================//
Show All 15 Lines

lib/Target/AMDGPU/SIRegisterInfo.td

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
def SCC_CLASS : RegisterClass<"AMDGPU", [i1], 1, (add SCC)> {		def SCC_CLASS : RegisterClass<"AMDGPU", [i1], 1, (add SCC)> {
let CopyCost = -1;		let CopyCost = -1;
let isAllocatable = 0;		let isAllocatable = 0;
}		}

// TODO: Do we need to set DwarfRegAlias on register tuples?		// TODO: Do we need to set DwarfRegAlias on register tuples?

// SGPR 32-bit registers		// SGPR 32-bit registers
def SGPR_32 : RegisterClass<"AMDGPU", [i32, f32], 32,		def SGPR_32 : RegisterClass<"AMDGPU", [i32, f32, i16], 32,
(add (sequence "SGPR%u", 0, 103))> {		(add (sequence "SGPR%u", 0, 103))> {
let AllocationPriority = 1;		let AllocationPriority = 1;
}		}

// SGPR 64-bit registers		// SGPR 64-bit registers
def SGPR_64Regs : RegisterTuples<[sub0, sub1],		def SGPR_64Regs : RegisterTuples<[sub0, sub1],
[(add (decimate SGPR_32, 2)),		[(add (decimate SGPR_32, 2)),
(add (decimate (shl SGPR_32, 1), 2))]>;		(add (decimate (shl SGPR_32, 1), 2))]>;
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
// Trap handler TMP 128-bit registers		// Trap handler TMP 128-bit registers
def TTMP_128Regs : RegisterTuples<[sub0, sub1, sub2, sub3],		def TTMP_128Regs : RegisterTuples<[sub0, sub1, sub2, sub3],
[(add (decimate TTMP_32, 4)),		[(add (decimate TTMP_32, 4)),
(add (decimate (shl TTMP_32, 1), 4)),		(add (decimate (shl TTMP_32, 1), 4)),
(add (decimate (shl TTMP_32, 2), 4)),		(add (decimate (shl TTMP_32, 2), 4)),
(add (decimate (shl TTMP_32, 3), 4))]>;		(add (decimate (shl TTMP_32, 3), 4))]>;

// VGPR 32-bit registers		// VGPR 32-bit registers
def VGPR_32 : RegisterClass<"AMDGPU", [i32, f32], 32,		// i16 only on VI+
		def VGPR_32 : RegisterClass<"AMDGPU", [i32, f32, i16], 32,
(add (sequence "VGPR%u", 0, 255))> {		(add (sequence "VGPR%u", 0, 255))> {
let AllocationPriority = 1;		let AllocationPriority = 1;
let Size = 32;		let Size = 32;
}		}

// VGPR 64-bit registers		// VGPR 64-bit registers
def VGPR_64 : RegisterTuples<[sub0, sub1],		def VGPR_64 : RegisterTuples<[sub0, sub1],
[(add (trunc VGPR_32, 255)),		[(add (trunc VGPR_32, 255)),
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
// See comments in SIInstructions.td for more info.		// See comments in SIInstructions.td for more info.
def SReg_32_XM0 : RegisterClass<"AMDGPU", [i32, f32], 32,		def SReg_32_XM0 : RegisterClass<"AMDGPU", [i32, f32], 32,
(add SGPR_32, VCC_LO, VCC_HI, EXEC_LO, EXEC_HI, FLAT_SCR_LO, FLAT_SCR_HI,		(add SGPR_32, VCC_LO, VCC_HI, EXEC_LO, EXEC_HI, FLAT_SCR_LO, FLAT_SCR_HI,
TTMP_32, TMA_LO, TMA_HI, TBA_LO, TBA_HI)> {		TTMP_32, TMA_LO, TMA_HI, TBA_LO, TBA_HI)> {
let AllocationPriority = 1;		let AllocationPriority = 1;
}		}

// Register class for all scalar registers (SGPRs + Special Registers)		// Register class for all scalar registers (SGPRs + Special Registers)
def SReg_32 : RegisterClass<"AMDGPU", [i32, f32], 32,		def SReg_32 : RegisterClass<"AMDGPU", [i32, f32, i16], 32,
(add SReg_32_XM0, M0)> {		(add SReg_32_XM0, M0, VCC_LO, VCC_HI, EXEC_LO, EXEC_HI, FLAT_SCR_LO, FLAT_SCR_HI)> {
let AllocationPriority = 1;		let AllocationPriority = 1;
}		}

def SGPR_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64], 32, (add SGPR_64Regs)> {		def SGPR_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64], 32, (add SGPR_64Regs)> {
let AllocationPriority = 2;		let AllocationPriority = 2;
}		}

def TTMP_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64], 32, (add TTMP_64Regs)> {		def TTMP_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64], 32, (add TTMP_64Regs)> {
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	def VReg_512 : RegisterClass<"AMDGPU", [v16i32, v16f32], 32, (add VGPR_512)> {
let CopyCost = 16;		let CopyCost = 16;
let AllocationPriority = 6;		let AllocationPriority = 6;
}		}

def VReg_1 : RegisterClass<"AMDGPU", [i1], 32, (add VGPR_32)> {		def VReg_1 : RegisterClass<"AMDGPU", [i1], 32, (add VGPR_32)> {
let Size = 32;		let Size = 32;
}		}

def VS_32 : RegisterClass<"AMDGPU", [i32, f32], 32, (add VGPR_32, SReg_32)> {		def VS_32 : RegisterClass<"AMDGPU", [i32, f32, i16], 32, (add VGPR_32, SReg_32)> {
let isAllocatable = 0;		let isAllocatable = 0;
}		}

def VS_64 : RegisterClass<"AMDGPU", [i64, f64], 32, (add VReg_64, SReg_64)> {		def VS_64 : RegisterClass<"AMDGPU", [i64, f64], 32, (add VReg_64, SReg_64)> {
let isAllocatable = 0;		let isAllocatable = 0;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SOPInstructions.td

	Show First 20 Lines • Show All 873 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// SOP1 Patterns			// SOP1 Patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def : Pat <			def : Pat <
	(i64 (ctpop i64:$src)),			(i64 (ctpop i64:$src)),
	(i64 (REG_SEQUENCE SReg_64,			(i64 (REG_SEQUENCE SReg_64,
	(i32 (COPY_TO_REGCLASS (S_BCNT1_I32_B64 $src), SReg_32)), sub0,			(i32 (COPY_TO_REGCLASS (S_BCNT1_I32_B64 $src), SReg_32)), sub0,
	(S_MOV_B32 0), sub1))			(S_MOV_B32 (i32 0)), sub1))
	>;			>;

	def : Pat <			def : Pat <
	(i32 (smax i32:$x, (i32 (ineg i32:$x)))),			(i32 (smax i32:$x, (i32 (ineg i32:$x)))),
	(S_ABS_I32 $x)			(S_ABS_I32 $x)
	>;			>;

				def : Pat <
				(i16 imm:$imm),
				(S_MOV_B32 imm:$imm)
				>;

				// Same as a 32-bit inreg
				def : Pat<
				(i32 (sext i16:$src)),
				(S_SEXT_I32_I16 $src)
				>;


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// SOP2 Patterns			// SOP2 Patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// V_ADD_I32_e32/S_ADD_U32 produces carry in VCC/SCC. For the vector			// V_ADD_I32_e32/S_ADD_U32 produces carry in VCC/SCC. For the vector
	// case, the sgpr-copies pass will fix this to use the vector version.			// case, the sgpr-copies pass will fix this to use the vector version.
	def : Pat <			def : Pat <
	(i32 (addc i32:$src0, i32:$src1)),			(i32 (addc i32:$src0, i32:$src1)),
	(S_ADD_U32 $src0, $src1)			(S_ADD_U32 $src0, $src1)
	>;			>;

				// FIXME: We need to use COPY_TO_REGCLASS to work-around the fact that
				// REG_SEQUENCE patterns don't support instructions with multiple
				// outputs.
				def : Pat<
				(i64 (zext i16:$src)),
				(REG_SEQUENCE SReg_64,
				(i32 (COPY_TO_REGCLASS (S_AND_B32 $src, (S_MOV_B32 (i32 0xffff))), SGPR_32)), sub0,
				(S_MOV_B32 (i32 0)), sub1)
				>;

				def : Pat <
				(i64 (sext i16:$src)),
				(REG_SEQUENCE SReg_64, (i32 (S_SEXT_I32_I16 $src)), sub0,
				(i32 (COPY_TO_REGCLASS (S_ASHR_I32 (i32 (S_SEXT_I32_I16 $src)), (S_MOV_B32 (i32 31))), SGPR_32)), sub1)
				>;

				def : Pat<
				(i32 (zext i16:$src)),
				(S_AND_B32 (S_MOV_B32 (i32 0xffff)), $src)
				>;



	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// SOPP Patterns			// SOPP Patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def : Pat <			def : Pat <
	(int_amdgcn_s_waitcnt i32:$simm16),			(int_amdgcn_s_waitcnt i32:$simm16),
	(S_WAITCNT (as_i16imm $simm16))			(S_WAITCNT (as_i16imm $simm16))
	>;			>;
	▲ Show 20 Lines • Show All 282 Lines • Show Last 20 Lines

lib/Target/AMDGPU/VIInstructions.td

This file was deleted.

	//===-- VIInstructions.td - VI Instruction Defintions ---------------------===//
	//
	// The LLVM Compiler Infrastructure
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//
	// Instruction definitions for VI and newer.
	//===----------------------------------------------------------------------===//

lib/Target/AMDGPU/VOP1Instructions.td

	Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
	defm V_TRUNC_F16 : VOP1Inst <"v_trunc_f16", VOP_F16_F16>;			defm V_TRUNC_F16 : VOP1Inst <"v_trunc_f16", VOP_F16_F16>;
	defm V_RNDNE_F16 : VOP1Inst <"v_rndne_f16", VOP_F16_F16>;			defm V_RNDNE_F16 : VOP1Inst <"v_rndne_f16", VOP_F16_F16>;
	defm V_FRACT_F16 : VOP1Inst <"v_fract_f16", VOP_F16_F16>;			defm V_FRACT_F16 : VOP1Inst <"v_fract_f16", VOP_F16_F16>;
	defm V_SIN_F16 : VOP1Inst <"v_sin_f16", VOP_F16_F16>;			defm V_SIN_F16 : VOP1Inst <"v_sin_f16", VOP_F16_F16>;
	defm V_COS_F16 : VOP1Inst <"v_cos_f16", VOP_F16_F16>;			defm V_COS_F16 : VOP1Inst <"v_cos_f16", VOP_F16_F16>;

	}			}

				let Predicates = [isVI] in {

				def : Pat<
				(f32 (f16_to_fp i16:$src)),
				(V_CVT_F32_F16_e32 $src)
				>;

				def : Pat<
				(i16 (fp_to_f16 f32:$src)),
				(V_CVT_F16_F32_e32 $src)
				>;

				}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Target			// Target
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// SI			// SI
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines
	def V_MOVRELD_B32_V2 : V_MOVRELD_B32_pseudo<VReg_64>;			def V_MOVRELD_B32_V2 : V_MOVRELD_B32_pseudo<VReg_64>;
	def V_MOVRELD_B32_V4 : V_MOVRELD_B32_pseudo<VReg_128>;			def V_MOVRELD_B32_V4 : V_MOVRELD_B32_pseudo<VReg_128>;
	def V_MOVRELD_B32_V8 : V_MOVRELD_B32_pseudo<VReg_256>;			def V_MOVRELD_B32_V8 : V_MOVRELD_B32_pseudo<VReg_256>;
	def V_MOVRELD_B32_V16 : V_MOVRELD_B32_pseudo<VReg_512>;			def V_MOVRELD_B32_V16 : V_MOVRELD_B32_pseudo<VReg_512>;

	let Predicates = [isVI] in {			let Predicates = [isVI] in {

	def : Pat <			def : Pat <
	(int_amdgcn_mov_dpp i32:$src, imm:$dpp_ctrl, imm:$row_mask, imm:$bank_mask,			(i32 (int_amdgcn_mov_dpp i32:$src, imm:$dpp_ctrl, imm:$row_mask, imm:$bank_mask,
	imm:$bound_ctrl),			imm:$bound_ctrl)),
	(V_MOV_B32_dpp $src, (as_i32imm $dpp_ctrl), (as_i32imm $row_mask),			(V_MOV_B32_dpp $src, (as_i32imm $dpp_ctrl), (as_i32imm $row_mask),
	(as_i32imm $bank_mask), (as_i1imm $bound_ctrl))			(as_i32imm $bank_mask), (as_i1imm $bound_ctrl))
	>;			>;


				def : Pat<
				(i32 (anyext i16:$src)),
				(COPY $src)
				>;

				def : Pat<
				(i64 (anyext i16:$src)),
				(REG_SEQUENCE VReg_64,
				(i32 (COPY $src)), sub0,
				(V_MOV_B32_e32 (i32 0)), sub1)
				>;

				def : Pat<
				(i16 (trunc i32:$src)),
				(COPY $src)
				>;

				def : Pat<
				(i1 (trunc i16:$src)),
				(COPY $src)
				>;


				def : Pat <
				(i16 (trunc i64:$src)),
				(EXTRACT_SUBREG $src, sub0)
				>;


				/*
				def : ZExt_i16_i1_Pat<zext>;
				def : ZExt_i16_i1_Pat<sext>;
				def : ZExt_i16_i1_Pat<anyext>;
				*/
				arsenmUnsubmitted Done Reply Inline Actions Commented out code arsenm: Commented out code

	} // End Predicates = [isVI]			} // End Predicates = [isVI]

lib/Target/AMDGPU/VOP2Instructions.td

	Show First 20 Lines • Show All 339 Lines • ▼ Show 20 Lines
	defm V_MAX_U16 : VOP2Inst <"v_max_u16", VOP_I16_I16_I16>;			defm V_MAX_U16 : VOP2Inst <"v_max_u16", VOP_I16_I16_I16>;
	defm V_MAX_I16 : VOP2Inst <"v_max_i16", VOP_I16_I16_I16>;			defm V_MAX_I16 : VOP2Inst <"v_max_i16", VOP_I16_I16_I16>;
	defm V_MIN_U16 : VOP2Inst <"v_min_u16", VOP_I16_I16_I16>;			defm V_MIN_U16 : VOP2Inst <"v_min_u16", VOP_I16_I16_I16>;
	defm V_MIN_I16 : VOP2Inst <"v_min_i16", VOP_I16_I16_I16>;			defm V_MIN_I16 : VOP2Inst <"v_min_i16", VOP_I16_I16_I16>;
	} // End isCommutable = 1			} // End isCommutable = 1

	} // End SubtargetPredicate = isVI			} // End SubtargetPredicate = isVI

				let Predicates = [isVI] in {
				arsenmUnsubmitted Done Reply Inline Actions This should only need to be around the actual defs, not the multiclasses arsenm: This should only need to be around the actual defs, not the multiclasses

				// Note: 16-bit instructions produce a 0 result in the high 16-bits.
				multiclass Arithmetic_i16_Pats <SDPatternOperator op, Instruction inst> {

				def : Pat<
				(op i16:$src0, i16:$src1),
				(inst i16:$src0, i16:$src1)
				arsenmUnsubmitted Done Reply Inline Actions I don't think you need to repeat the type in the output here arsenm: I don't think you need to repeat the type in the output here
				>;

				def : Pat<
				(i32 (zext (op i16:$src0, i16:$src1))),
				(inst i16:$src0, i16:$src1)
				>;

				def : Pat<
				(i64 (zext (op i16:$src0, i16:$src1))),
				(REG_SEQUENCE VReg_64,
				(inst i16:$src0, i16:$src1), sub0,
				(V_MOV_B32_e32 (i32 0)), sub1)
				>;

				}

				multiclass Bits_OpsRev_i16_Pats <SDPatternOperator op, Instruction inst> {

				def : Pat<
				(op i16:$src0, i32:$src1),
				(inst $src1, $src0)
				>;

				def : Pat<
				(i32 (zext (op i16:$src0, i32:$src1))),
				(inst $src1, $src0)
				>;


				def : Pat<
				(i64 (zext (op i16:$src0, i32:$src1))),
				(REG_SEQUENCE VReg_64,
				(inst $src1, $src0), sub0,
				(V_MOV_B32_e32 (i32 0)), sub1)
				>;
				}

				defm : Arithmetic_i16_Pats<add, V_ADD_U16_e32>;
				defm : Arithmetic_i16_Pats<mul, V_MUL_LO_U16_e32>;
				defm : Arithmetic_i16_Pats<sub, V_SUB_U16_e32>;
				defm : Arithmetic_i16_Pats<smin, V_MIN_I16_e32>;
				defm : Arithmetic_i16_Pats<smax, V_MAX_I16_e32>;
				defm : Arithmetic_i16_Pats<umin, V_MIN_U16_e32>;
				defm : Arithmetic_i16_Pats<umax, V_MAX_U16_e32>;

				defm : Arithmetic_i16_Pats<and, V_AND_B32_e32>;
				defm : Arithmetic_i16_Pats<or, V_OR_B32_e32>;
				defm : Arithmetic_i16_Pats<xor, V_XOR_B32_e32>;

				defm : Bits_OpsRev_i16_Pats<shl, V_LSHLREV_B16_e32>;
				defm : Bits_OpsRev_i16_Pats<srl, V_LSHRREV_B16_e32>;
				defm : Bits_OpsRev_i16_Pats<sra, V_ASHRREV_B16_e32>;

				class ZExt_i16_i1_Pat <SDNode ext> : Pat <
				(i16 (ext i1:$src)),
				(V_CNDMASK_B32_e64 (i32 0), (i32 1), $src)
				>;

				def : ZExt_i16_i1_Pat<zext>;
				def : ZExt_i16_i1_Pat<sext>;
				def : ZExt_i16_i1_Pat<anyext>;

				} // End Predicates = [isVI]

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// SI			// SI
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	let AssemblerPredicates = [isSICI], DecoderNamespace = "SICI" in {			let AssemblerPredicates = [isSICI], DecoderNamespace = "SICI" in {

	multiclass VOP2_Real_si <bits<6> op> {			multiclass VOP2_Real_si <bits<6> op> {
	def _si :			def _si :
	▲ Show 20 Lines • Show All 254 Lines • Show Last 20 Lines

lib/Target/AMDGPU/VOP3Instructions.td

	Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines
	let isCommutable = 1 in {			let isCommutable = 1 in {
	def V_MAD_F16 : VOP3Inst <"v_mad_f16", VOP3_Profile<VOP_F16_F16_F16_F16>>;			def V_MAD_F16 : VOP3Inst <"v_mad_f16", VOP3_Profile<VOP_F16_F16_F16_F16>>;
	def V_MAD_U16 : VOP3Inst <"v_mad_u16", VOP3_Profile<VOP_I16_I16_I16_I16>>;			def V_MAD_U16 : VOP3Inst <"v_mad_u16", VOP3_Profile<VOP_I16_I16_I16_I16>>;
	def V_MAD_I16 : VOP3Inst <"v_mad_i16", VOP3_Profile<VOP_I16_I16_I16_I16>>;			def V_MAD_I16 : VOP3Inst <"v_mad_i16", VOP3_Profile<VOP_I16_I16_I16_I16>>;
	}			}

	} // End SubtargetPredicate = isVI			} // End SubtargetPredicate = isVI

				def : Pat <
				(i16 (select i1:$src0, i16:$src1, i16:$src2)),
				(V_CNDMASK_B32_e64 $src2, $src1, $src0)
				>;

				let Predicates = [isVI] in {

				multiclass Tenary_i16_Pats <SDPatternOperator op1, SDPatternOperator op2, Instruction inst, SDPatternOperator op3> {
				arsenmUnsubmitted Not Done Reply Inline Actions Line wrapping arsenm: Line wrapping
				def : Pat<
				(op2 (op1 i16:$src0, i16:$src1), i16:$src2),
				(inst i16:$src0, i16:$src1, i16:$src2)
				>;

				def : Pat<
				(i32 (op3 (op2 (op1 i16:$src0, i16:$src1), i16:$src2))),
				(inst i16:$src0, i16:$src1, i16:$src2)
				>;

				def : Pat<
				(i64 (op3 (op2 (op1 i16:$src0, i16:$src1), i16:$src2))),
				(REG_SEQUENCE VReg_64,
				(inst i16:$src0, i16:$src1, i16:$src2), sub0,
				(V_MOV_B32_e32 (i32 0)), sub1)
				>;
				}

				defm: Tenary_i16_Pats<mul, add, V_MAD_U16, zext>;
				defm: Tenary_i16_Pats<mul, add, V_MAD_I16, sext>;

				} // End Predicates = [isVI]


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Target			// Target
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// SI			// SI
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/add.i16.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=GCN %s

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_add_i16:
				; VI: flat_load_ushort [[A:v[0-9]+]]
				; VI: flat_load_ushort [[B:v[0-9]+]]
				; VI: v_add_u16_e32 [[ADD:v[0-9]+]], [[A]], [[B]]
				; VI-NEXT: buffer_store_short [[ADD]]
				define void @v_test_add_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %in0, i16 addrspace(1)* %in1) #1 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%gep.out = getelementptr inbounds i16, i16 addrspace(1)* %out, i32 %tid
				%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid
				%gep.in1 = getelementptr inbounds i16, i16 addrspace(1)* %in1, i32 %tid
				%a = load volatile i16, i16 addrspace(1)* %gep.in0
				%b = load volatile i16, i16 addrspace(1)* %gep.in1
				%add = add i16 %a, %b
				store i16 %add, i16 addrspace(1)* %out
				ret void
				}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_add_i16_constant:
				; VI: flat_load_ushort [[A:v[0-9]+]]
				; VI: v_add_u16_e32 [[ADD:v[0-9]+]], 0x7b, [[A]]
				; VI-NEXT: buffer_store_short [[ADD]]
				define void @v_test_add_i16_constant(i16 addrspace(1)* %out, i16 addrspace(1)* %in0) #1 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%gep.out = getelementptr inbounds i16, i16 addrspace(1)* %out, i32 %tid
				%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid
				%a = load volatile i16, i16 addrspace(1)* %gep.in0
				%add = add i16 %a, 123
				store i16 %add, i16 addrspace(1)* %out
				ret void
				}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_add_i16_neg_constant:
				; VI: flat_load_ushort [[A:v[0-9]+]]
				; VI: v_add_u16_e32 [[ADD:v[0-9]+]], 0xfffffcb3, [[A]]
				; VI-NEXT: buffer_store_short [[ADD]]
				define void @v_test_add_i16_neg_constant(i16 addrspace(1)* %out, i16 addrspace(1)* %in0) #1 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%gep.out = getelementptr inbounds i16, i16 addrspace(1)* %out, i32 %tid
				%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid
				%a = load volatile i16, i16 addrspace(1)* %gep.in0
				%add = add i16 %a, -845
				store i16 %add, i16 addrspace(1)* %out
				ret void
				}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_add_i16_inline_neg1:
				; VI: flat_load_ushort [[A:v[0-9]+]]
				; VI: v_add_u16_e32 [[ADD:v[0-9]+]], -1, [[A]]
				; VI-NEXT: buffer_store_short [[ADD]]
				define void @v_test_add_i16_inline_neg1(i16 addrspace(1)* %out, i16 addrspace(1)* %in0) #1 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%gep.out = getelementptr inbounds i16, i16 addrspace(1)* %out, i32 %tid
				%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid
				%a = load volatile i16, i16 addrspace(1)* %gep.in0
				%add = add i16 %a, -1
				store i16 %add, i16 addrspace(1)* %out
				ret void
				}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_add_i16_zext_to_i32:
				; VI: flat_load_ushort [[A:v[0-9]+]]
				; VI: flat_load_ushort [[B:v[0-9]+]]
				; VI: v_add_u16_e32 [[ADD:v[0-9]+]], [[A]], [[B]]
				; VI-NEXT: buffer_store_dword [[ADD]]
				define void @v_test_add_i16_zext_to_i32(i32 addrspace(1)* %out, i16 addrspace(1)* %in0, i16 addrspace(1)* %in1) #1 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%gep.out = getelementptr inbounds i32, i32 addrspace(1)* %out, i32 %tid
				%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid
				%gep.in1 = getelementptr inbounds i16, i16 addrspace(1)* %in1, i32 %tid
				%a = load volatile i16, i16 addrspace(1)* %gep.in0
				%b = load volatile i16, i16 addrspace(1)* %gep.in1
				%add = add i16 %a, %b
				%ext = zext i16 %add to i32
				store i32 %ext, i32 addrspace(1)* %out
				ret void
				}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_add_i16_zext_to_i64:
				; VI: flat_load_ushort [[A:v[0-9]+]]
				; VI: flat_load_ushort [[B:v[0-9]+]]
				; VI-DAG: v_add_u16_e32 v[[ADD:[0-9]+]], [[A]], [[B]]
				; VI-DAG: v_mov_b32_e32 v[[VZERO:[0-9]+]], 0
				; VI: buffer_store_dwordx2 v{{\[}}[[ADD]]:[[VZERO]]{{\]}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0{{$}}
				define void @v_test_add_i16_zext_to_i64(i64 addrspace(1)* %out, i16 addrspace(1)* %in0, i16 addrspace(1)* %in1) #1 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%gep.out = getelementptr inbounds i64, i64 addrspace(1)* %out, i32 %tid
				%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid
				%gep.in1 = getelementptr inbounds i16, i16 addrspace(1)* %in1, i32 %tid
				%a = load volatile i16, i16 addrspace(1)* %gep.in0
				%b = load volatile i16, i16 addrspace(1)* %gep.in1
				%add = add i16 %a, %b
				%ext = zext i16 %add to i64
				store i64 %ext, i64 addrspace(1)* %out
				ret void
				}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_add_i16_sext_to_i32:
				; VI: flat_load_ushort [[A:v[0-9]+]]
				; VI: flat_load_ushort [[B:v[0-9]+]]
				; VI: v_add_u16_e32 [[ADD:v[0-9]+]], [[A]], [[B]]
				; VI-NEXT: v_bfe_i32 [[SEXT:v[0-9]+]], [[ADD]], 0, 16
				; VI-NEXT: buffer_store_dword [[SEXT]]
				define void @v_test_add_i16_sext_to_i32(i32 addrspace(1)* %out, i16 addrspace(1)* %in0, i16 addrspace(1)* %in1) #1 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%gep.out = getelementptr inbounds i32, i32 addrspace(1)* %out, i32 %tid
				%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid
				%gep.in1 = getelementptr inbounds i16, i16 addrspace(1)* %in1, i32 %tid
				%a = load i16, i16 addrspace(1)* %gep.in0
				%b = load i16, i16 addrspace(1)* %gep.in1
				%add = add i16 %a, %b
				%ext = sext i16 %add to i32
				store i32 %ext, i32 addrspace(1)* %out
				ret void
				}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_add_i16_sext_to_i64:
				; VI: flat_load_ushort [[A:v[0-9]+]]
				; VI: flat_load_ushort [[B:v[0-9]+]]
				; VI: v_add_u16_e32 [[ADD:v[0-9]+]], [[A]], [[B]]
				; VI-NEXT: v_bfe_i32 v[[LO:[0-9]+]], [[ADD]], 0, 16
				; VI-NEXT: v_ashrrev_i32_e32 v[[HI:[0-9]+]], 31, v[[LO]]
				; VI-NEXT: buffer_store_dwordx2 v{{\[}}[[LO]]:[[HI]]{{\]}}
				define void @v_test_add_i16_sext_to_i64(i64 addrspace(1)* %out, i16 addrspace(1)* %in0, i16 addrspace(1)* %in1) #1 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%gep.out = getelementptr inbounds i64, i64 addrspace(1)* %out, i32 %tid
				%gep.in0 = getelementptr inbounds i16, i16 addrspace(1)* %in0, i32 %tid
				%gep.in1 = getelementptr inbounds i16, i16 addrspace(1)* %in1, i32 %tid
				%a = load i16, i16 addrspace(1)* %gep.in0
				%b = load i16, i16 addrspace(1)* %gep.in1
				%add = add i16 %a, %b
				%ext = sext i16 %add to i64
				store i64 %ext, i64 addrspace(1)* %out
				ret void
				}

				declare i32 @llvm.amdgcn.workitem.id.x() #0

				attributes #0 = { nounwind readnone }
				attributes #1 = { nounwind }

test/CodeGen/AMDGPU/anyext.ll

	; RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI %s
	; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VI %s

	; CHECK-LABEL: {{^}}anyext_i1_i32:			declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	; CHECK: v_cndmask_b32_e64			declare i32 @llvm.amdgcn.workitem.id.y() nounwind readnone

				; GCN-LABEL: {{^}}anyext_i1_i32:
				; GCN: v_cndmask_b32_e64
	define void @anyext_i1_i32(i32 addrspace(1)* %out, i32 %cond) {			define void @anyext_i1_i32(i32 addrspace(1)* %out, i32 %cond) {
	entry:			entry:
	%0 = icmp eq i32 %cond, 0			%tmp = icmp eq i32 %cond, 0
	%1 = zext i1 %0 to i8			%tmp1 = zext i1 %tmp to i8
	%2 = xor i8 %1, -1			%tmp2 = xor i8 %tmp1, -1
	%3 = and i8 %2, 1			%tmp3 = and i8 %tmp2, 1
	%4 = zext i8 %3 to i32			%tmp4 = zext i8 %tmp3 to i32
	store i32 %4, i32 addrspace(1)* %out			store i32 %tmp4, i32 addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}s_anyext_i16_i32:
				; VI: v_add_u16_e32 [[ADD:v[0-9]+]],
				; VI: v_xor_b32_e32 [[XOR:v[0-9]+]], -1, [[ADD]]
				; VI: v_and_b32_e32 [[AND:v[0-9]+]], 1, [[XOR]]
				; VI: buffer_store_dword [[AND]]
				define void @s_anyext_i16_i32(i32 addrspace(1)* %out, i16 addrspace(1)* %a, i16 addrspace(1)* %b) {
				entry:
				%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.y = call i32 @llvm.amdgcn.workitem.id.y()
				%a.ptr = getelementptr i16, i16 addrspace(1)* %a, i32 %tid.x
				%b.ptr = getelementptr i16, i16 addrspace(1)* %b, i32 %tid.y
				%a.l = load i16, i16 addrspace(1)* %a.ptr
				%b.l = load i16, i16 addrspace(1)* %b.ptr
				%tmp = add i16 %a.l, %b.l
				%tmp1 = trunc i16 %tmp to i8
				%tmp2 = xor i8 %tmp1, -1
				%tmp3 = and i8 %tmp2, 1
				%tmp4 = zext i8 %tmp3 to i32
				store i32 %tmp4, i32 addrspace(1)* %out
	ret void			ret void
	}			}

test/CodeGen/AMDGPU/bitreverse.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
				; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=FUNC %s

	declare i16 @llvm.bitreverse.i16(i16) #1			declare i16 @llvm.bitreverse.i16(i16) #1
	declare i32 @llvm.bitreverse.i32(i32) #1			declare i32 @llvm.bitreverse.i32(i32) #1
	declare i64 @llvm.bitreverse.i64(i64) #1			declare i64 @llvm.bitreverse.i64(i64) #1

	declare <2 x i32> @llvm.bitreverse.v2i32(<2 x i32>) #1			declare <2 x i32> @llvm.bitreverse.v2i32(<2 x i32>) #1
	declare <4 x i32> @llvm.bitreverse.v4i32(<4 x i32>) #1			declare <4 x i32> @llvm.bitreverse.v4i32(<4 x i32>) #1

	declare <2 x i64> @llvm.bitreverse.v2i64(<2 x i64>) #1			declare <2 x i64> @llvm.bitreverse.v2i64(<2 x i64>) #1
	declare <4 x i64> @llvm.bitreverse.v4i64(<4 x i64>) #1			declare <4 x i64> @llvm.bitreverse.v4i64(<4 x i64>) #1

	; FUNC-LABEL: {{^}}s_brev_i16:			; FUNC-LABEL: {{^}}s_brev_i16:
	; SI: s_brev_b32			; SI: s_brev_b32
	define void @s_brev_i16(i16 addrspace(1)* noalias %out, i16 %val) #0 {			define void @s_brev_i16(i16 addrspace(1)* noalias %out, i16 %val) #0 {
	%brev = call i16 @llvm.bitreverse.i16(i16 %val) #1			%brev = call i16 @llvm.bitreverse.i16(i16 %val) #1
	store i16 %brev, i16 addrspace(1)* %out			store i16 %brev, i16 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}v_brev_i16:			; FUNC-LABEL: {{^}}v_brev_i16:
	; SI: v_bfrev_b32_e32			; SI: v_bfrev_b32_e32
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/cgp-bitfield-extract.ll

	Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	; OPT: %1 = lshr i16 %arg1, 4			; OPT: %1 = lshr i16 %arg1, 4
	; OPT-NEXT: %val1 = and i16 %1, 127			; OPT-NEXT: %val1 = and i16 %1, 127
	; OPT: br label			; OPT: br label

	; OPT: ret:			; OPT: ret:
	; OPT: store			; OPT: store
	; OPT: ret			; OPT: ret

				; For GFX8: since i16 is legal type, we cannot sink lshr into BBs.

	; GCN-LABEL: {{^}}sink_ubfe_i16:			; GCN-LABEL: {{^}}sink_ubfe_i16:
	; GCN-NOT: lshr			; GCN-NOT: lshr
				; VI: s_bfe_u32 s0, s0, 0xc0004
	; GCN: s_cbranch_vccnz			; GCN: s_cbranch_vccnz

	; GCN: s_bfe_u32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80004			; SI: s_bfe_u32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80004
				; VI: s_and_b32 s0, s0, 0xff

	; GCN: BB2_2:			; GCN: BB2_2:
	; GCN: s_bfe_u32 s{{[0-9]+}}, s{{[0-9]+}}, 0x70004			; SI: s_bfe_u32 s{{[0-9]+}}, s{{[0-9]+}}, 0x70004
				; VI: s_and_b32 s0, s0, 0x7f

	; GCN: BB2_3:			; GCN: BB2_3:
	; GCN: buffer_store_short			; GCN: buffer_store_short
	; GCN: s_endpgm			; GCN: s_endpgm
	define void @sink_ubfe_i16(i16 addrspace(1)* %out, i16 %arg1) #0 {			define void @sink_ubfe_i16(i16 addrspace(1)* %out, i16 %arg1) #0 {
	entry:			entry:
	%shr = lshr i16 %arg1, 4			%shr = lshr i16 %arg1, 4
	br i1 undef, label %bb0, label %bb1			br i1 undef, label %bb0, label %bb1
	▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/copy-illegal-type.ll

	; RUN: llc -march=amdgcn -mcpu=tahiti < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tahiti < %s \| FileCheck -check-prefix=GCN -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga < %s \| FileCheck -check-prefix=GCN -check-prefix=VI -check-prefix=FUNC %s

				declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
				declare i32 @llvm.amdgcn.workitem.id.y() nounwind readnone

	; FUNC-LABEL: {{^}}test_copy_v4i8:			; FUNC-LABEL: {{^}}test_copy_v4i8:
	; SI: buffer_load_dword [[REG:v[0-9]+]]			; GCN: buffer_load_dword [[REG:v[0-9]+]]
	; SI: buffer_store_dword [[REG]]			; GCN: buffer_store_dword [[REG]]
	; SI: s_endpgm			; GCN: s_endpgm
	define void @test_copy_v4i8(<4 x i8> addrspace(1)* %out, <4 x i8> addrspace(1)* %in) nounwind {			define void @test_copy_v4i8(<4 x i8> addrspace(1)* %out, <4 x i8> addrspace(1)* %in) nounwind {
	%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4			%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_copy_v4i8_x2:			; FUNC-LABEL: {{^}}test_copy_v4i8_x2:
	; SI: buffer_load_dword [[REG:v[0-9]+]]			; GCN: buffer_load_dword [[REG:v[0-9]+]]
	; SI: buffer_store_dword [[REG]]			; GCN: buffer_store_dword [[REG]]
	; SI: buffer_store_dword [[REG]]			; GCN: buffer_store_dword [[REG]]
	; SI: s_endpgm			; GCN: s_endpgm
	define void @test_copy_v4i8_x2(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %in) nounwind {			define void @test_copy_v4i8_x2(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %in) nounwind {
	%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4			%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out1, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out1, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_copy_v4i8_x3:			; FUNC-LABEL: {{^}}test_copy_v4i8_x3:
	; SI: buffer_load_dword [[REG:v[0-9]+]]			; GCN: buffer_load_dword [[REG:v[0-9]+]]
	; SI: buffer_store_dword [[REG]]			; GCN: buffer_store_dword [[REG]]
	; SI: buffer_store_dword [[REG]]			; GCN: buffer_store_dword [[REG]]
	; SI: buffer_store_dword [[REG]]			; GCN: buffer_store_dword [[REG]]
	; SI: s_endpgm			; GCN: s_endpgm
	define void @test_copy_v4i8_x3(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %out2, <4 x i8> addrspace(1)* %in) nounwind {			define void @test_copy_v4i8_x3(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %out2, <4 x i8> addrspace(1)* %in) nounwind {
	%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4			%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out1, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out1, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out2, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out2, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_copy_v4i8_x4:			; FUNC-LABEL: {{^}}test_copy_v4i8_x4:
	; SI: buffer_load_dword [[REG:v[0-9]+]]			; GCN: buffer_load_dword [[REG:v[0-9]+]]
	; SI: buffer_store_dword [[REG]]			; GCN: buffer_store_dword [[REG]]
	; SI: buffer_store_dword [[REG]]			; GCN: buffer_store_dword [[REG]]
	; SI: buffer_store_dword [[REG]]			; GCN: buffer_store_dword [[REG]]
	; SI: buffer_store_dword [[REG]]			; GCN: buffer_store_dword [[REG]]
	; SI: s_endpgm			; GCN: s_endpgm
	define void @test_copy_v4i8_x4(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %out2, <4 x i8> addrspace(1)* %out3, <4 x i8> addrspace(1)* %in) nounwind {			define void @test_copy_v4i8_x4(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %out2, <4 x i8> addrspace(1)* %out3, <4 x i8> addrspace(1)* %in) nounwind {
	%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4			%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out1, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out1, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out2, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out2, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out3, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out3, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_copy_v4i8_extra_use:			; FUNC-LABEL: {{^}}test_copy_v4i8_extra_use:
	; SI: buffer_load_dword			; GCN: buffer_load_dword
	; SI-DAG: v_lshrrev_b32			; GCN-DAG: v_lshrrev_b32
	; SI: v_and_b32			; GCN: v_and_b32
	; SI: v_or_b32			; GCN: v_or_b32
	; SI-DAG: buffer_store_dword			; GCN-DAG: buffer_store_dword
	; SI-DAG: buffer_store_dword			; GCN-DAG: buffer_store_dword

	; SI: s_endpgm			; GCN: s_endpgm
	define void @test_copy_v4i8_extra_use(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %in) nounwind {			define void @test_copy_v4i8_extra_use(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %in) nounwind {
	%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4			%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4
	%add = add <4 x i8> %val, <i8 9, i8 9, i8 9, i8 9>			%add = add <4 x i8> %val, <i8 9, i8 9, i8 9, i8 9>
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4
	store <4 x i8> %add, <4 x i8> addrspace(1)* %out1, align 4			store <4 x i8> %add, <4 x i8> addrspace(1)* %out1, align 4
	ret void			ret void
	}			}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
	; FUNC-LABEL: {{^}}test_copy_v4i8_x2_extra_use:			; FUNC-LABEL: {{^}}test_copy_v4i8_x2_extra_use:
	; SI: buffer_load_dword			; GCN: {{buffer\|flat}}_load_dword
	; SI-DAG: v_lshrrev_b32			; GCN-DAG: v_lshrrev_b32
	; SI-DAG: v_add_i32			; SI-DAG: v_add_i32
	; SI-DAG: v_and_b32			; VI-DAG: v_add_u16
	; SI-DAG: v_or_b32			; GCN-DAG: v_and_b32
	; SI-DAG: buffer_store_dword			; GCN-DAG: v_or_b32
	; SI: buffer_store_dword			; GCN-DAG: {{buffer\|flat}}_store_dword
	; SI: buffer_store_dword			; GCN: {{buffer\|flat}}_store_dword
	; SI: s_endpgm			; GCN: {{buffer\|flat}}_store_dword
				; GCN: s_endpgm
	define void @test_copy_v4i8_x2_extra_use(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %out2, <4 x i8> addrspace(1)* %in) nounwind {			define void @test_copy_v4i8_x2_extra_use(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %out2, <4 x i8> addrspace(1)* %in) nounwind {
	%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4			%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
				%in.ptr = getelementptr <4 x i8>, <4 x i8> addrspace(1)* %in, i32 %tid.x
				%val = load <4 x i8>, <4 x i8> addrspace(1)* %in.ptr, align 4
	%add = add <4 x i8> %val, <i8 9, i8 9, i8 9, i8 9>			%add = add <4 x i8> %val, <i8 9, i8 9, i8 9, i8 9>
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4
	store <4 x i8> %add, <4 x i8> addrspace(1)* %out1, align 4			store <4 x i8> %add, <4 x i8> addrspace(1)* %out1, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out2, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out2, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_copy_v3i8_align4:			; FUNC-LABEL: {{^}}test_copy_v3i8_align4:
	; SI: buffer_load_dword			; GCN: buffer_load_dword
	; SI-DAG: buffer_store_short v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}			; GCN-DAG: buffer_store_short v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}
	; SI-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:2{{$}}			; GCN-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:2{{$}}
	; SI: s_endpgm			; GCN: s_endpgm
	define void @test_copy_v3i8_align4(<3 x i8> addrspace(1)* %out, <3 x i8> addrspace(1)* %in) nounwind {			define void @test_copy_v3i8_align4(<3 x i8> addrspace(1)* %out, <3 x i8> addrspace(1)* %in) nounwind {
	%val = load <3 x i8>, <3 x i8> addrspace(1)* %in, align 4			%val = load <3 x i8>, <3 x i8> addrspace(1)* %in, align 4
	store <3 x i8> %val, <3 x i8> addrspace(1)* %out, align 4			store <3 x i8> %val, <3 x i8> addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_copy_v3i8_align2:			; FUNC-LABEL: {{^}}test_copy_v3i8_align2:
	; SI-DAG: buffer_load_ushort v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}			; GCN-DAG: buffer_load_ushort v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}
	; SI-DAG: buffer_load_ubyte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:2{{$}}			; GCN-DAG: buffer_load_ubyte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:2{{$}}
	; SI-DAG: buffer_store_short v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}			; GCN-DAG: buffer_store_short v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}
	; SI-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:2{{$}}			; GCN-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:2{{$}}
	; SI: s_endpgm			; GCN: s_endpgm
	define void @test_copy_v3i8_align2(<3 x i8> addrspace(1)* %out, <3 x i8> addrspace(1)* %in) nounwind {			define void @test_copy_v3i8_align2(<3 x i8> addrspace(1)* %out, <3 x i8> addrspace(1)* %in) nounwind {
	%val = load <3 x i8>, <3 x i8> addrspace(1)* %in, align 2			%val = load <3 x i8>, <3 x i8> addrspace(1)* %in, align 2
	store <3 x i8> %val, <3 x i8> addrspace(1)* %out, align 2			store <3 x i8> %val, <3 x i8> addrspace(1)* %out, align 2
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_copy_v3i8_align1:			; FUNC-LABEL: {{^}}test_copy_v3i8_align1:
	; SI: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; SI: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; SI: buffer_load_ubyte			; GCN: buffer_load_ubyte

	; SI: buffer_store_byte			; GCN: buffer_store_byte
	; SI: buffer_store_byte			; GCN: buffer_store_byte
	; SI: buffer_store_byte			; GCN: buffer_store_byte
	; SI: s_endpgm			; GCN: s_endpgm
	define void @test_copy_v3i8_align1(<3 x i8> addrspace(1)* %out, <3 x i8> addrspace(1)* %in) nounwind {			define void @test_copy_v3i8_align1(<3 x i8> addrspace(1)* %out, <3 x i8> addrspace(1)* %in) nounwind {
	%val = load <3 x i8>, <3 x i8> addrspace(1)* %in, align 1			%val = load <3 x i8>, <3 x i8> addrspace(1)* %in, align 1
	store <3 x i8> %val, <3 x i8> addrspace(1)* %out, align 1			store <3 x i8> %val, <3 x i8> addrspace(1)* %out, align 1
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_copy_v4i8_volatile_load:			; FUNC-LABEL: {{^}}test_copy_v4i8_volatile_load:
	; SI: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; SI: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; SI: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; SI: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; SI: buffer_store_dword			; GCN: buffer_store_dword
	; SI: s_endpgm			; GCN: s_endpgm
	define void @test_copy_v4i8_volatile_load(<4 x i8> addrspace(1)* %out, <4 x i8> addrspace(1)* %in) nounwind {			define void @test_copy_v4i8_volatile_load(<4 x i8> addrspace(1)* %out, <4 x i8> addrspace(1)* %in) nounwind {
	%val = load volatile <4 x i8>, <4 x i8> addrspace(1)* %in, align 4			%val = load volatile <4 x i8>, <4 x i8> addrspace(1)* %in, align 4
	store <4 x i8> %val, <4 x i8> addrspace(1)* %out, align 4			store <4 x i8> %val, <4 x i8> addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_copy_v4i8_volatile_store:			; FUNC-LABEL: {{^}}test_copy_v4i8_volatile_store:
	; SI: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; SI: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; SI: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; SI: buffer_load_ubyte			; GCN: buffer_load_ubyte
	; SI: buffer_store_byte			; GCN: buffer_store_byte
	; SI: buffer_store_byte			; GCN: buffer_store_byte
	; SI: buffer_store_byte			; GCN: buffer_store_byte
	; SI: buffer_store_byte			; GCN: buffer_store_byte
	; SI: s_endpgm			; GCN: s_endpgm
	define void @test_copy_v4i8_volatile_store(<4 x i8> addrspace(1)* %out, <4 x i8> addrspace(1)* %in) nounwind {			define void @test_copy_v4i8_volatile_store(<4 x i8> addrspace(1)* %out, <4 x i8> addrspace(1)* %in) nounwind {
	%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4			%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4
	store volatile <4 x i8> %val, <4 x i8> addrspace(1)* %out, align 4			store volatile <4 x i8> %val, <4 x i8> addrspace(1)* %out, align 4
	ret void			ret void
	}			}

test/CodeGen/AMDGPU/ctlz.ll

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	define void @v_ctlz_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %valptr) nounwind {
store <4 x i32> %ctlz, <4 x i32> addrspace(1)* %out, align 16		store <4 x i32> %ctlz, <4 x i32> addrspace(1)* %out, align 16
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctlz_i8:		; FUNC-LABEL: {{^}}v_ctlz_i8:
; GCN: buffer_load_ubyte [[VAL:v[0-9]+]],		; GCN: buffer_load_ubyte [[VAL:v[0-9]+]],
; GCN-DAG: v_ffbh_u32_e32 [[RESULT:v[0-9]+]], [[VAL]]		; GCN-DAG: v_ffbh_u32_e32 [[RESULT:v[0-9]+]], [[VAL]]
; GCN: buffer_store_byte [[RESULT]],		; GCN: buffer_store_byte [[RESULT]],
		; GCN: s_endpgm
define void @v_ctlz_i8(i8 addrspace(1)* noalias %out, i8 addrspace(1)* noalias %valptr) nounwind {		define void @v_ctlz_i8(i8 addrspace(1)* noalias %out, i8 addrspace(1)* noalias %valptr) nounwind {
%val = load i8, i8 addrspace(1)* %valptr		%val = load i8, i8 addrspace(1)* %valptr
%ctlz = call i8 @llvm.ctlz.i8(i8 %val, i1 false) nounwind readnone		%ctlz = call i8 @llvm.ctlz.i8(i8 %val, i1 false) nounwind readnone
store i8 %ctlz, i8 addrspace(1)* %out		store i8 %ctlz, i8 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_ctlz_i64:		; FUNC-LABEL: {{^}}s_ctlz_i64:
▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/ctlz_zero_undef.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=FUNC -check-prefix=GCN %s
	; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	declare i8 @llvm.ctlz.i8(i8, i1) nounwind readnone			declare i8 @llvm.ctlz.i8(i8, i1) nounwind readnone

	declare i32 @llvm.ctlz.i32(i32, i1) nounwind readnone			declare i32 @llvm.ctlz.i32(i32, i1) nounwind readnone
	declare <2 x i32> @llvm.ctlz.v2i32(<2 x i32>, i1) nounwind readnone			declare <2 x i32> @llvm.ctlz.v2i32(<2 x i32>, i1) nounwind readnone
	declare <4 x i32> @llvm.ctlz.v4i32(<4 x i32>, i1) nounwind readnone			declare <4 x i32> @llvm.ctlz.v4i32(<4 x i32>, i1) nounwind readnone

	▲ Show 20 Lines • Show All 259 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/cube.ll

Show All 24 Lines	define void @cube(<4 x float> addrspace(1)* %out, float %a, float %b, float %c) #1 {
%vec1 = insertelement <4 x float> %vec0, float %cubesc, i32 1		%vec1 = insertelement <4 x float> %vec0, float %cubesc, i32 1
%vec2 = insertelement <4 x float> %vec1, float %cubetc, i32 2		%vec2 = insertelement <4 x float> %vec1, float %cubetc, i32 2
%vec3 = insertelement <4 x float> %vec2, float %cubema, i32 3		%vec3 = insertelement <4 x float> %vec2, float %cubema, i32 3
store <4 x float> %vec3, <4 x float> addrspace(1)* %out		store <4 x float> %vec3, <4 x float> addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}legacy_cube:		; GCN-LABEL: {{^}}legacy_cube:
; GCN-DAG: v_cubeid_f32 v{{[0-9]+}}, s{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DAG: v_cubeid_f32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, s{{[0-9]+}}
; GCN-DAG: v_cubesc_f32 v{{[0-9]+}}, s{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DAG: v_cubesc_f32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, s{{[0-9]+}}
; GCN-DAG: v_cubetc_f32 v{{[0-9]+}}, s{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DAG: v_cubetc_f32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, s{{[0-9]+}}
; GCN-DAG: v_cubema_f32 v{{[0-9]+}}, s{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DAG: v_cubema_f32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, s{{[0-9]+}}
; GCN: buffer_store_dwordx4		; GCN: buffer_store_dwordx4
define void @legacy_cube(<4 x float> addrspace(1)* %out, <4 x float> %abcx) #1 {		define void @legacy_cube(<4 x float> addrspace(1)* %out, <4 x float> %abcx) #1 {
%cube = call <4 x float> @llvm.AMDGPU.cube(<4 x float> %abcx)		%cube = call <4 x float> @llvm.AMDGPU.cube(<4 x float> %abcx)
store <4 x float> %cube, <4 x float> addrspace(1)* %out		store <4 x float> %cube, <4 x float> addrspace(1)* %out
ret void		ret void
}		}

attributes #0 = { nounwind readnone }		attributes #0 = { nounwind readnone }
attributes #1 = { nounwind }		attributes #1 = { nounwind }

test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VI %s

	declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	declare i32 @llvm.amdgcn.workitem.id.y() nounwind readnone			declare i32 @llvm.amdgcn.workitem.id.y() nounwind readnone

	; SI-LABEL: {{^}}load_i8_to_f32:			; GCN-LABEL: {{^}}load_i8_to_f32:
	; SI: buffer_load_ubyte [[LOADREG:v[0-9]+]],			; GCN: buffer_load_ubyte [[LOADREG:v[0-9]+]],
	; SI-NOT: bfe			; GCN-NOT: bfe
	; SI-NOT: lshr			; GCN-NOT: lshr
	; SI: v_cvt_f32_ubyte0_e32 [[CONV:v[0-9]+]], [[LOADREG]]			; GCN: v_cvt_f32_ubyte0_e32 [[CONV:v[0-9]+]], [[LOADREG]]
	; SI: buffer_store_dword [[CONV]],			; GCN: buffer_store_dword [[CONV]],
	define void @load_i8_to_f32(float addrspace(1)* noalias %out, i8 addrspace(1)* noalias %in) nounwind {			define void @load_i8_to_f32(float addrspace(1)* noalias %out, i8 addrspace(1)* noalias %in) nounwind {
	%load = load i8, i8 addrspace(1)* %in, align 1			%load = load i8, i8 addrspace(1)* %in, align 1
	%cvt = uitofp i8 %load to float			%cvt = uitofp i8 %load to float
	store float %cvt, float addrspace(1)* %out, align 4			store float %cvt, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}load_v2i8_to_v2f32:			; GCN-LABEL: {{^}}load_v2i8_to_v2f32:
	; SI: buffer_load_ushort [[LD:v[0-9]+]]			; GCN: buffer_load_ushort [[LD:v[0-9]+]]
	; SI-DAG: v_cvt_f32_ubyte1_e32 v[[HIRESULT:[0-9]+]], [[LD]]			; GCN-DAG: v_cvt_f32_ubyte1_e32 v[[HIRESULT:[0-9]+]], [[LD]]
	; SI-DAG: v_cvt_f32_ubyte0_e32 v[[LORESULT:[0-9]+]], [[LD]]			; GCN-DAG: v_cvt_f32_ubyte0_e32 v[[LORESULT:[0-9]+]], [[LD]]
	; SI: buffer_store_dwordx2 v{{\[}}[[LORESULT]]:[[HIRESULT]]{{\]}},			; GCN: buffer_store_dwordx2 v{{\[}}[[LORESULT]]:[[HIRESULT]]{{\]}},
	define void @load_v2i8_to_v2f32(<2 x float> addrspace(1)* noalias %out, <2 x i8> addrspace(1)* noalias %in) nounwind {			define void @load_v2i8_to_v2f32(<2 x float> addrspace(1)* noalias %out, <2 x i8> addrspace(1)* noalias %in) nounwind {
	%load = load <2 x i8>, <2 x i8> addrspace(1)* %in, align 2			%load = load <2 x i8>, <2 x i8> addrspace(1)* %in, align 2
	%cvt = uitofp <2 x i8> %load to <2 x float>			%cvt = uitofp <2 x i8> %load to <2 x float>
	store <2 x float> %cvt, <2 x float> addrspace(1)* %out, align 16			store <2 x float> %cvt, <2 x float> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}load_v3i8_to_v3f32:			; GCN-LABEL: {{^}}load_v3i8_to_v3f32:
	; SI: buffer_load_dword [[VAL:v[0-9]+]]			; GCN: buffer_load_dword [[VAL:v[0-9]+]]
	; SI-NOT: v_cvt_f32_ubyte3_e32			; GCN-NOT: v_cvt_f32_ubyte3_e32
	; SI-DAG: v_cvt_f32_ubyte2_e32 v{{[0-9]+}}, [[VAL]]			; GCN-DAG: v_cvt_f32_ubyte2_e32 v{{[0-9]+}}, [[VAL]]
	; SI-DAG: v_cvt_f32_ubyte1_e32 v[[HIRESULT:[0-9]+]], [[VAL]]			; GCN-DAG: v_cvt_f32_ubyte1_e32 v[[HIRESULT:[0-9]+]], [[VAL]]
	; SI-DAG: v_cvt_f32_ubyte0_e32 v[[LORESULT:[0-9]+]], [[VAL]]			; GCN-DAG: v_cvt_f32_ubyte0_e32 v[[LORESULT:[0-9]+]], [[VAL]]
	; SI: buffer_store_dwordx2 v{{\[}}[[LORESULT]]:[[HIRESULT]]{{\]}},			; GCN: buffer_store_dwordx2 v{{\[}}[[LORESULT]]:[[HIRESULT]]{{\]}},
	define void @load_v3i8_to_v3f32(<3 x float> addrspace(1)* noalias %out, <3 x i8> addrspace(1)* noalias %in) nounwind {			define void @load_v3i8_to_v3f32(<3 x float> addrspace(1)* noalias %out, <3 x i8> addrspace(1)* noalias %in) nounwind {
	%load = load <3 x i8>, <3 x i8> addrspace(1)* %in, align 4			%load = load <3 x i8>, <3 x i8> addrspace(1)* %in, align 4
	%cvt = uitofp <3 x i8> %load to <3 x float>			%cvt = uitofp <3 x i8> %load to <3 x float>
	store <3 x float> %cvt, <3 x float> addrspace(1)* %out, align 16			store <3 x float> %cvt, <3 x float> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}load_v4i8_to_v4f32:			; GCN-LABEL: {{^}}load_v4i8_to_v4f32:
	; SI: buffer_load_dword [[LOADREG:v[0-9]+]]			; GCN: buffer_load_dword [[LOADREG:v[0-9]+]]
	; SI-NOT: bfe			; GCN-NOT: bfe
	; SI-NOT: lshr			; GCN-NOT: lshr
	; SI-DAG: v_cvt_f32_ubyte3_e32 v[[HIRESULT:[0-9]+]], [[LOADREG]]			; GCN-DAG: v_cvt_f32_ubyte3_e32 v[[HIRESULT:[0-9]+]], [[LOADREG]]
	; SI-DAG: v_cvt_f32_ubyte2_e32 v{{[0-9]+}}, [[LOADREG]]			; GCN-DAG: v_cvt_f32_ubyte2_e32 v{{[0-9]+}}, [[LOADREG]]
	; SI-DAG: v_cvt_f32_ubyte1_e32 v{{[0-9]+}}, [[LOADREG]]			; GCN-DAG: v_cvt_f32_ubyte1_e32 v{{[0-9]+}}, [[LOADREG]]
	; SI-DAG: v_cvt_f32_ubyte0_e32 v[[LORESULT:[0-9]+]], [[LOADREG]]			; GCN-DAG: v_cvt_f32_ubyte0_e32 v[[LORESULT:[0-9]+]], [[LOADREG]]
	; SI: buffer_store_dwordx4 v{{\[}}[[LORESULT]]:[[HIRESULT]]{{\]}},			; GCN: buffer_store_dwordx4 v{{\[}}[[LORESULT]]:[[HIRESULT]]{{\]}},
	define void @load_v4i8_to_v4f32(<4 x float> addrspace(1)* noalias %out, <4 x i8> addrspace(1)* noalias %in) nounwind {			define void @load_v4i8_to_v4f32(<4 x float> addrspace(1)* noalias %out, <4 x i8> addrspace(1)* noalias %in) nounwind {
	%load = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4			%load = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4
	%cvt = uitofp <4 x i8> %load to <4 x float>			%cvt = uitofp <4 x i8> %load to <4 x float>
	store <4 x float> %cvt, <4 x float> addrspace(1)* %out, align 16			store <4 x float> %cvt, <4 x float> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

	; This should not be adding instructions to shift into the correct			; This should not be adding instructions to shift into the correct
	; position in the word for the component.			; position in the word for the component.

	; FIXME: Packing bytes			; FIXME: Packing bytes
	; SI-LABEL: {{^}}load_v4i8_to_v4f32_unaligned:			; GCN-LABEL: {{^}}load_v4i8_to_v4f32_unaligned:
	; SI: buffer_load_ubyte [[LOADREG3:v[0-9]+]]			; GCN: buffer_load_ubyte [[LOADREG3:v[0-9]+]]
	; SI: buffer_load_ubyte [[LOADREG2:v[0-9]+]]			; GCN: buffer_load_ubyte [[LOADREG2:v[0-9]+]]
	; SI: buffer_load_ubyte [[LOADREG1:v[0-9]+]]			; GCN: buffer_load_ubyte [[LOADREG1:v[0-9]+]]
	; SI: buffer_load_ubyte [[LOADREG0:v[0-9]+]]			; GCN: buffer_load_ubyte [[LOADREG0:v[0-9]+]]
	; SI-DAG: v_lshlrev_b32			; GCN-DAG: v_lshlrev_b32
	; SI-DAG: v_or_b32			; GCN-DAG: v_or_b32
	; SI-DAG: v_cvt_f32_ubyte0_e32 v[[LORESULT:[0-9]+]],			; GCN-DAG: v_cvt_f32_ubyte0_e32 v[[LORESULT:[0-9]+]],
	; SI-DAG: v_cvt_f32_ubyte0_e32 v{{[0-9]+}},			; GCN-DAG: v_cvt_f32_ubyte0_e32 v{{[0-9]+}},
	; SI-DAG: v_cvt_f32_ubyte0_e32 v{{[0-9]+}},			; GCN-DAG: v_cvt_f32_ubyte0_e32 v{{[0-9]+}},
	; SI-DAG: v_cvt_f32_ubyte0_e32 v[[HIRESULT:[0-9]+]]			; GCN-DAG: v_cvt_f32_ubyte0_e32 v[[HIRESULT:[0-9]+]]

	; SI: buffer_store_dwordx4			; GCN: buffer_store_dwordx4
	define void @load_v4i8_to_v4f32_unaligned(<4 x float> addrspace(1)* noalias %out, <4 x i8> addrspace(1)* noalias %in) nounwind {			define void @load_v4i8_to_v4f32_unaligned(<4 x float> addrspace(1)* noalias %out, <4 x i8> addrspace(1)* noalias %in) nounwind {
	%load = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 1			%load = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 1
	%cvt = uitofp <4 x i8> %load to <4 x float>			%cvt = uitofp <4 x i8> %load to <4 x float>
	store <4 x float> %cvt, <4 x float> addrspace(1)* %out, align 16			store <4 x float> %cvt, <4 x float> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

	; FIXME: Need to handle non-uniform case for function below (load without gep).			; FIXME: Need to handle non-uniform case for function below (load without gep).
	; Instructions still emitted to repack bytes for add use.			; Instructions still emitted to repack bytes for add use.
	; SI-LABEL: {{^}}load_v4i8_to_v4f32_2_uses:
	; SI: {{buffer\|flat}}_load_dword
	; SI-DAG: v_cvt_f32_ubyte0_e32
	; SI-DAG: v_cvt_f32_ubyte1_e32
	; SI-DAG: v_cvt_f32_ubyte2_e32
	; SI-DAG: v_cvt_f32_ubyte3_e32

	; SI-DAG: v_lshrrev_b32_e32 v{{[0-9]+}}, 24			; GCN-LABEL: {{^}}load_v4i8_to_v4f32_2_uses:
	; SI-DAG: v_lshrrev_b32_e32 v{{[0-9]+}}, 16			; GCN: {{buffer\|flat}}_load_dword
				; GCN-DAG: v_cvt_f32_ubyte0_e32
				; GCN-DAG: v_cvt_f32_ubyte1_e32
				; GCN-DAG: v_cvt_f32_ubyte2_e32
				; GCN-DAG: v_cvt_f32_ubyte3_e32

				; GCN-DAG: v_lshrrev_b32_e32 v{{[0-9]+}}, 24
				; GCN-DAG: v_lshrrev_b32_e32 v{{[0-9]+}}, 16

	; SI-DAG: v_lshlrev_b32_e32 v{{[0-9]+}}, 16			; SI-DAG: v_lshlrev_b32_e32 v{{[0-9]+}}, 16
	; SI-DAG: v_lshlrev_b32_e32 v{{[0-9]+}}, 8			; SI-DAG: v_lshlrev_b32_e32 v{{[0-9]+}}, 8
	; SI-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xffff,			; SI-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xffff,
	; SI-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xff00,			; SI-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xff00,
	; SI-DAG: v_add_i32			; SI-DAG: v_add_i32

	; SI: {{buffer\|flat}}_store_dwordx4			; VI-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xffffff00,
	; SI: {{buffer\|flat}}_store_dword			; VI-DAG: v_add_u16_e32
				; VI-DAG: v_add_u16_e32

				; GCN: {{buffer\|flat}}_store_dwordx4
				; GCN: {{buffer\|flat}}_store_dword

	; SI: s_endpgm			; GCN: s_endpgm
	define void @load_v4i8_to_v4f32_2_uses(<4 x float> addrspace(1)* noalias %out, <4 x i8> addrspace(1)* noalias %out2, <4 x i8> addrspace(1)* noalias %in) nounwind {			define void @load_v4i8_to_v4f32_2_uses(<4 x float> addrspace(1)* noalias %out, <4 x i8> addrspace(1)* noalias %out2, <4 x i8> addrspace(1)* noalias %in) nounwind {
	%tid.x = call i32 @llvm.amdgcn.workitem.id.x()			%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
	%in.ptr = getelementptr <4 x i8>, <4 x i8> addrspace(1)* %in, i32 %tid.x			%in.ptr = getelementptr <4 x i8>, <4 x i8> addrspace(1)* %in, i32 %tid.x
	%load = load <4 x i8>, <4 x i8> addrspace(1)* %in.ptr, align 4			%load = load <4 x i8>, <4 x i8> addrspace(1)* %in.ptr, align 4
	%cvt = uitofp <4 x i8> %load to <4 x float>			%cvt = uitofp <4 x i8> %load to <4 x float>
	store <4 x float> %cvt, <4 x float> addrspace(1)* %out, align 16			store <4 x float> %cvt, <4 x float> addrspace(1)* %out, align 16
	%add = add <4 x i8> %load, <i8 9, i8 9, i8 9, i8 9> ; Second use of %load			%add = add <4 x i8> %load, <i8 9, i8 9, i8 9, i8 9> ; Second use of %load
	store <4 x i8> %add, <4 x i8> addrspace(1)* %out2, align 4			store <4 x i8> %add, <4 x i8> addrspace(1)* %out2, align 4
	ret void			ret void
	}			}

	; Make sure this doesn't crash.			; Make sure this doesn't crash.
	; SI-LABEL: {{^}}load_v7i8_to_v7f32:			; GCN-LABEL: {{^}}load_v7i8_to_v7f32:
	; SI: s_endpgm			; GCN: s_endpgm
	define void @load_v7i8_to_v7f32(<7 x float> addrspace(1)* noalias %out, <7 x i8> addrspace(1)* noalias %in) nounwind {			define void @load_v7i8_to_v7f32(<7 x float> addrspace(1)* noalias %out, <7 x i8> addrspace(1)* noalias %in) nounwind {
	%load = load <7 x i8>, <7 x i8> addrspace(1)* %in, align 1			%load = load <7 x i8>, <7 x i8> addrspace(1)* %in, align 1
	%cvt = uitofp <7 x i8> %load to <7 x float>			%cvt = uitofp <7 x i8> %load to <7 x float>
	store <7 x float> %cvt, <7 x float> addrspace(1)* %out, align 16			store <7 x float> %cvt, <7 x float> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}load_v8i8_to_v8f32:			; GCN-LABEL: {{^}}load_v8i8_to_v8f32:
	; SI: buffer_load_dwordx2 v{{\[}}[[LOLOAD:[0-9]+]]:[[HILOAD:[0-9]+]]{{\]}},			; GCN: buffer_load_dwordx2 v{{\[}}[[LOLOAD:[0-9]+]]:[[HILOAD:[0-9]+]]{{\]}},
	; SI-NOT: bfe			; GCN-NOT: bfe
	; SI-NOT: lshr			; GCN-NOT: lshr
	; SI-DAG: v_cvt_f32_ubyte3_e32 v{{[0-9]+}}, v[[LOLOAD]]			; GCN-DAG: v_cvt_f32_ubyte3_e32 v{{[0-9]+}}, v[[LOLOAD]]
	; SI-DAG: v_cvt_f32_ubyte2_e32 v{{[0-9]+}}, v[[LOLOAD]]			; GCN-DAG: v_cvt_f32_ubyte2_e32 v{{[0-9]+}}, v[[LOLOAD]]
	; SI-DAG: v_cvt_f32_ubyte1_e32 v{{[0-9]+}}, v[[LOLOAD]]			; GCN-DAG: v_cvt_f32_ubyte1_e32 v{{[0-9]+}}, v[[LOLOAD]]
	; SI-DAG: v_cvt_f32_ubyte0_e32 v{{[0-9]+}}, v[[LOLOAD]]			; GCN-DAG: v_cvt_f32_ubyte0_e32 v{{[0-9]+}}, v[[LOLOAD]]
	; SI-DAG: v_cvt_f32_ubyte3_e32 v{{[0-9]+}}, v[[HILOAD]]			; GCN-DAG: v_cvt_f32_ubyte3_e32 v{{[0-9]+}}, v[[HILOAD]]
	; SI-DAG: v_cvt_f32_ubyte2_e32 v{{[0-9]+}}, v[[HILOAD]]			; GCN-DAG: v_cvt_f32_ubyte2_e32 v{{[0-9]+}}, v[[HILOAD]]
	; SI-DAG: v_cvt_f32_ubyte1_e32 v{{[0-9]+}}, v[[HILOAD]]			; GCN-DAG: v_cvt_f32_ubyte1_e32 v{{[0-9]+}}, v[[HILOAD]]
	; SI-DAG: v_cvt_f32_ubyte0_e32 v{{[0-9]+}}, v[[HILOAD]]			; GCN-DAG: v_cvt_f32_ubyte0_e32 v{{[0-9]+}}, v[[HILOAD]]
	; SI-NOT: bfe			; GCN-NOT: bfe
	; SI-NOT: lshr			; GCN-NOT: lshr
	; SI: buffer_store_dwordx4			; GCN: buffer_store_dwordx4
	; SI: buffer_store_dwordx4			; GCN: buffer_store_dwordx4
	define void @load_v8i8_to_v8f32(<8 x float> addrspace(1)* noalias %out, <8 x i8> addrspace(1)* noalias %in) nounwind {			define void @load_v8i8_to_v8f32(<8 x float> addrspace(1)* noalias %out, <8 x i8> addrspace(1)* noalias %in) nounwind {
	%load = load <8 x i8>, <8 x i8> addrspace(1)* %in, align 8			%load = load <8 x i8>, <8 x i8> addrspace(1)* %in, align 8
	%cvt = uitofp <8 x i8> %load to <8 x float>			%cvt = uitofp <8 x i8> %load to <8 x float>
	store <8 x float> %cvt, <8 x float> addrspace(1)* %out, align 16			store <8 x float> %cvt, <8 x float> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}i8_zext_inreg_i32_to_f32:			; GCN-LABEL: {{^}}i8_zext_inreg_i32_to_f32:
	; SI: buffer_load_dword [[LOADREG:v[0-9]+]],			; GCN: buffer_load_dword [[LOADREG:v[0-9]+]],
	; SI: v_add_i32_e32 [[ADD:v[0-9]+]], vcc, 2, [[LOADREG]]			; GCN: v_add_i32_e32 [[ADD:v[0-9]+]], vcc, 2, [[LOADREG]]
	; SI-NEXT: v_cvt_f32_ubyte0_e32 [[CONV:v[0-9]+]], [[ADD]]			; GCN-NEXT: v_cvt_f32_ubyte0_e32 [[CONV:v[0-9]+]], [[ADD]]
	; SI: buffer_store_dword [[CONV]],			; GCN: buffer_store_dword [[CONV]],
	define void @i8_zext_inreg_i32_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {			define void @i8_zext_inreg_i32_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {
	%load = load i32, i32 addrspace(1)* %in, align 4			%load = load i32, i32 addrspace(1)* %in, align 4
	%add = add i32 %load, 2			%add = add i32 %load, 2
	%inreg = and i32 %add, 255			%inreg = and i32 %add, 255
	%cvt = uitofp i32 %inreg to float			%cvt = uitofp i32 %inreg to float
	store float %cvt, float addrspace(1)* %out, align 4			store float %cvt, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}i8_zext_inreg_hi1_to_f32:			; GCN-LABEL: {{^}}i8_zext_inreg_hi1_to_f32:
	define void @i8_zext_inreg_hi1_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {			define void @i8_zext_inreg_hi1_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {
	%load = load i32, i32 addrspace(1)* %in, align 4			%load = load i32, i32 addrspace(1)* %in, align 4
	%inreg = and i32 %load, 65280			%inreg = and i32 %load, 65280
	%shr = lshr i32 %inreg, 8			%shr = lshr i32 %inreg, 8
	%cvt = uitofp i32 %shr to float			%cvt = uitofp i32 %shr to float
	store float %cvt, float addrspace(1)* %out, align 4			store float %cvt, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; We don't get these ones because of the zext, but instcombine removes			; We don't get these ones because of the zext, but instcombine removes
	; them so it shouldn't really matter.			; them so it shouldn't really matter.
	; SI-LABEL: {{^}}i8_zext_i32_to_f32:			; GCN-LABEL: {{^}}i8_zext_i32_to_f32:
	define void @i8_zext_i32_to_f32(float addrspace(1)* noalias %out, i8 addrspace(1)* noalias %in) nounwind {			define void @i8_zext_i32_to_f32(float addrspace(1)* noalias %out, i8 addrspace(1)* noalias %in) nounwind {
	%load = load i8, i8 addrspace(1)* %in, align 1			%load = load i8, i8 addrspace(1)* %in, align 1
	%ext = zext i8 %load to i32			%ext = zext i8 %load to i32
	%cvt = uitofp i32 %ext to float			%cvt = uitofp i32 %ext to float
	store float %cvt, float addrspace(1)* %out, align 4			store float %cvt, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}v4i8_zext_v4i32_to_v4f32:			; GCN-LABEL: {{^}}v4i8_zext_v4i32_to_v4f32:
	define void @v4i8_zext_v4i32_to_v4f32(<4 x float> addrspace(1)* noalias %out, <4 x i8> addrspace(1)* noalias %in) nounwind {			define void @v4i8_zext_v4i32_to_v4f32(<4 x float> addrspace(1)* noalias %out, <4 x i8> addrspace(1)* noalias %in) nounwind {
	%load = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 1			%load = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 1
	%ext = zext <4 x i8> %load to <4 x i32>			%ext = zext <4 x i8> %load to <4 x i32>
	%cvt = uitofp <4 x i32> %ext to <4 x float>			%cvt = uitofp <4 x i32> %ext to <4 x float>
	store <4 x float> %cvt, <4 x float> addrspace(1)* %out, align 16			store <4 x float> %cvt, <4 x float> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}extract_byte0_to_f32:			; GCN-LABEL: {{^}}extract_byte0_to_f32:
	; SI: buffer_load_dword [[VAL:v[0-9]+]]			; GCN: buffer_load_dword [[VAL:v[0-9]+]]
	; SI-NOT: [[VAL]]			; GCN-NOT: [[VAL]]
	; SI: v_cvt_f32_ubyte0_e32 [[CONV:v[0-9]+]], [[VAL]]			; GCN: v_cvt_f32_ubyte0_e32 [[CONV:v[0-9]+]], [[VAL]]
	; SI: buffer_store_dword [[CONV]]			; GCN: buffer_store_dword [[CONV]]
	define void @extract_byte0_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {			define void @extract_byte0_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {
	%val = load i32, i32 addrspace(1)* %in			%val = load i32, i32 addrspace(1)* %in
	%and = and i32 %val, 255			%and = and i32 %val, 255
	%cvt = uitofp i32 %and to float			%cvt = uitofp i32 %and to float
	store float %cvt, float addrspace(1)* %out			store float %cvt, float addrspace(1)* %out
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}extract_byte1_to_f32:			; GCN-LABEL: {{^}}extract_byte1_to_f32:
	; SI: buffer_load_dword [[VAL:v[0-9]+]]			; GCN: buffer_load_dword [[VAL:v[0-9]+]]
	; SI-NOT: [[VAL]]			; GCN-NOT: [[VAL]]
	; SI: v_cvt_f32_ubyte1_e32 [[CONV:v[0-9]+]], [[VAL]]			; GCN: v_cvt_f32_ubyte1_e32 [[CONV:v[0-9]+]], [[VAL]]
	; SI: buffer_store_dword [[CONV]]			; GCN: buffer_store_dword [[CONV]]
	define void @extract_byte1_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {			define void @extract_byte1_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {
	%val = load i32, i32 addrspace(1)* %in			%val = load i32, i32 addrspace(1)* %in
	%srl = lshr i32 %val, 8			%srl = lshr i32 %val, 8
	%and = and i32 %srl, 255			%and = and i32 %srl, 255
	%cvt = uitofp i32 %and to float			%cvt = uitofp i32 %and to float
	store float %cvt, float addrspace(1)* %out			store float %cvt, float addrspace(1)* %out
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}extract_byte2_to_f32:			; GCN-LABEL: {{^}}extract_byte2_to_f32:
	; SI: buffer_load_dword [[VAL:v[0-9]+]]			; GCN: buffer_load_dword [[VAL:v[0-9]+]]
	; SI-NOT: [[VAL]]			; GCN-NOT: [[VAL]]
	; SI: v_cvt_f32_ubyte2_e32 [[CONV:v[0-9]+]], [[VAL]]			; GCN: v_cvt_f32_ubyte2_e32 [[CONV:v[0-9]+]], [[VAL]]
	; SI: buffer_store_dword [[CONV]]			; GCN: buffer_store_dword [[CONV]]
	define void @extract_byte2_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {			define void @extract_byte2_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {
	%val = load i32, i32 addrspace(1)* %in			%val = load i32, i32 addrspace(1)* %in
	%srl = lshr i32 %val, 16			%srl = lshr i32 %val, 16
	%and = and i32 %srl, 255			%and = and i32 %srl, 255
	%cvt = uitofp i32 %and to float			%cvt = uitofp i32 %and to float
	store float %cvt, float addrspace(1)* %out			store float %cvt, float addrspace(1)* %out
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}extract_byte3_to_f32:			; GCN-LABEL: {{^}}extract_byte3_to_f32:
	; SI: buffer_load_dword [[VAL:v[0-9]+]]			; GCN: buffer_load_dword [[VAL:v[0-9]+]]
	; SI-NOT: [[VAL]]			; GCN-NOT: [[VAL]]
	; SI: v_cvt_f32_ubyte3_e32 [[CONV:v[0-9]+]], [[VAL]]			; GCN: v_cvt_f32_ubyte3_e32 [[CONV:v[0-9]+]], [[VAL]]
	; SI: buffer_store_dword [[CONV]]			; GCN: buffer_store_dword [[CONV]]
	define void @extract_byte3_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {			define void @extract_byte3_to_f32(float addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {
	%val = load i32, i32 addrspace(1)* %in			%val = load i32, i32 addrspace(1)* %in
	%srl = lshr i32 %val, 24			%srl = lshr i32 %val, 24
	%and = and i32 %srl, 255			%and = and i32 %srl, 255
	%cvt = uitofp i32 %and to float			%cvt = uitofp i32 %and to float
	store float %cvt, float addrspace(1)* %out			store float %cvt, float addrspace(1)* %out
	ret void			ret void
	}			}

test/CodeGen/AMDGPU/global-extload-i16.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs< %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
				; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs< %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
				; XUN: llc -march=r600 -mcpu=cypress < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s
				; FIXME: cypress is broken because the bigger testcases spill and it's not implemented

				; FUNC-LABEL: {{^}}zextload_global_i16_to_i32:
				; SI: buffer_load_ushort
				; SI: buffer_store_dword
				; SI: s_endpgm
				define void @zextload_global_i16_to_i32(i32 addrspace(1)* %out, i16 addrspace(1)* %in) nounwind {
				%a = load i16, i16 addrspace(1)* %in
				%ext = zext i16 %a to i32
				store i32 %ext, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_i16_to_i32:
				; SI: buffer_load_sshort
				; SI: buffer_store_dword
				; SI: s_endpgm
				define void @sextload_global_i16_to_i32(i32 addrspace(1)* %out, i16 addrspace(1)* %in) nounwind {
				%a = load i16, i16 addrspace(1)* %in
				%ext = sext i16 %a to i32
				store i32 %ext, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v1i16_to_v1i32:
				; SI: buffer_load_ushort
				; SI: s_endpgm
				define void @zextload_global_v1i16_to_v1i32(<1 x i32> addrspace(1)* %out, <1 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <1 x i16>, <1 x i16> addrspace(1)* %in
				%ext = zext <1 x i16> %load to <1 x i32>
				store <1 x i32> %ext, <1 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v1i16_to_v1i32:
				; SI: buffer_load_sshort
				; SI: s_endpgm
				define void @sextload_global_v1i16_to_v1i32(<1 x i32> addrspace(1)* %out, <1 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <1 x i16>, <1 x i16> addrspace(1)* %in
				%ext = sext <1 x i16> %load to <1 x i32>
				store <1 x i32> %ext, <1 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v2i16_to_v2i32:
				; SI: s_endpgm
				define void @zextload_global_v2i16_to_v2i32(<2 x i32> addrspace(1)* %out, <2 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <2 x i16>, <2 x i16> addrspace(1)* %in
				%ext = zext <2 x i16> %load to <2 x i32>
				store <2 x i32> %ext, <2 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v2i16_to_v2i32:
				; SI: s_endpgm
				define void @sextload_global_v2i16_to_v2i32(<2 x i32> addrspace(1)* %out, <2 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <2 x i16>, <2 x i16> addrspace(1)* %in
				%ext = sext <2 x i16> %load to <2 x i32>
				store <2 x i32> %ext, <2 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v4i16_to_v4i32:
				; SI: s_endpgm
				define void @zextload_global_v4i16_to_v4i32(<4 x i32> addrspace(1)* %out, <4 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <4 x i16>, <4 x i16> addrspace(1)* %in
				%ext = zext <4 x i16> %load to <4 x i32>
				store <4 x i32> %ext, <4 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v4i16_to_v4i32:
				; SI: s_endpgm
				define void @sextload_global_v4i16_to_v4i32(<4 x i32> addrspace(1)* %out, <4 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <4 x i16>, <4 x i16> addrspace(1)* %in
				%ext = sext <4 x i16> %load to <4 x i32>
				store <4 x i32> %ext, <4 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v8i16_to_v8i32:
				; SI: s_endpgm
				define void @zextload_global_v8i16_to_v8i32(<8 x i32> addrspace(1)* %out, <8 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <8 x i16>, <8 x i16> addrspace(1)* %in
				%ext = zext <8 x i16> %load to <8 x i32>
				store <8 x i32> %ext, <8 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v8i16_to_v8i32:
				; SI: s_endpgm
				define void @sextload_global_v8i16_to_v8i32(<8 x i32> addrspace(1)* %out, <8 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <8 x i16>, <8 x i16> addrspace(1)* %in
				%ext = sext <8 x i16> %load to <8 x i32>
				store <8 x i32> %ext, <8 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v16i16_to_v16i32:
				; SI: s_endpgm
				define void @zextload_global_v16i16_to_v16i32(<16 x i32> addrspace(1)* %out, <16 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <16 x i16>, <16 x i16> addrspace(1)* %in
				%ext = zext <16 x i16> %load to <16 x i32>
				store <16 x i32> %ext, <16 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v16i16_to_v16i32:
				; SI: s_endpgm
				define void @sextload_global_v16i16_to_v16i32(<16 x i32> addrspace(1)* %out, <16 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <16 x i16>, <16 x i16> addrspace(1)* %in
				%ext = sext <16 x i16> %load to <16 x i32>
				store <16 x i32> %ext, <16 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v32i16_to_v32i32:
				; SI: s_endpgm
				define void @zextload_global_v32i16_to_v32i32(<32 x i32> addrspace(1)* %out, <32 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <32 x i16>, <32 x i16> addrspace(1)* %in
				%ext = zext <32 x i16> %load to <32 x i32>
				store <32 x i32> %ext, <32 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v32i16_to_v32i32:
				; SI: s_endpgm
				define void @sextload_global_v32i16_to_v32i32(<32 x i32> addrspace(1)* %out, <32 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <32 x i16>, <32 x i16> addrspace(1)* %in
				%ext = sext <32 x i16> %load to <32 x i32>
				store <32 x i32> %ext, <32 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v64i16_to_v64i32:
				; SI: s_endpgm
				define void @zextload_global_v64i16_to_v64i32(<64 x i32> addrspace(1)* %out, <64 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <64 x i16>, <64 x i16> addrspace(1)* %in
				%ext = zext <64 x i16> %load to <64 x i32>
				store <64 x i32> %ext, <64 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v64i16_to_v64i32:
				; SI: s_endpgm
				define void @sextload_global_v64i16_to_v64i32(<64 x i32> addrspace(1)* %out, <64 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <64 x i16>, <64 x i16> addrspace(1)* %in
				%ext = sext <64 x i16> %load to <64 x i32>
				store <64 x i32> %ext, <64 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_i16_to_i64:
				; SI-DAG: buffer_load_ushort v[[LO:[0-9]+]],
				; SI-DAG: v_mov_b32_e32 v[[HI:[0-9]+]], 0{{$}}
				; SI: buffer_store_dwordx2 v{{\[}}[[LO]]:[[HI]]]
				define void @zextload_global_i16_to_i64(i64 addrspace(1)* %out, i16 addrspace(1)* %in) nounwind {
				%a = load i16, i16 addrspace(1)* %in
				%ext = zext i16 %a to i64
				store i64 %ext, i64 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_i16_to_i64:
				; VI: buffer_load_ushort [[LOAD:v[0-9]+]], s[{{[0-9]+:[0-9]+}}], 0
				; VI: v_ashrrev_i32_e32 v{{[0-9]+}}, 31, [[LOAD]]
				; VI: buffer_store_dwordx2 v[{{[0-9]+:[0-9]+}}], s[{{[0-9]+:[0-9]+}}], 0
				define void @sextload_global_i16_to_i64(i64 addrspace(1)* %out, i16 addrspace(1)* %in) nounwind {
				%a = load i16, i16 addrspace(1)* %in
				%ext = sext i16 %a to i64
				store i64 %ext, i64 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v1i16_to_v1i64:
				; SI: s_endpgm
				define void @zextload_global_v1i16_to_v1i64(<1 x i64> addrspace(1)* %out, <1 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <1 x i16>, <1 x i16> addrspace(1)* %in
				%ext = zext <1 x i16> %load to <1 x i64>
				store <1 x i64> %ext, <1 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v1i16_to_v1i64:
				; SI: s_endpgm
				define void @sextload_global_v1i16_to_v1i64(<1 x i64> addrspace(1)* %out, <1 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <1 x i16>, <1 x i16> addrspace(1)* %in
				%ext = sext <1 x i16> %load to <1 x i64>
				store <1 x i64> %ext, <1 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v2i16_to_v2i64:
				; SI: s_endpgm
				define void @zextload_global_v2i16_to_v2i64(<2 x i64> addrspace(1)* %out, <2 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <2 x i16>, <2 x i16> addrspace(1)* %in
				%ext = zext <2 x i16> %load to <2 x i64>
				store <2 x i64> %ext, <2 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v2i16_to_v2i64:
				; SI: s_endpgm
				define void @sextload_global_v2i16_to_v2i64(<2 x i64> addrspace(1)* %out, <2 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <2 x i16>, <2 x i16> addrspace(1)* %in
				%ext = sext <2 x i16> %load to <2 x i64>
				store <2 x i64> %ext, <2 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v4i16_to_v4i64:
				; SI: s_endpgm
				define void @zextload_global_v4i16_to_v4i64(<4 x i64> addrspace(1)* %out, <4 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <4 x i16>, <4 x i16> addrspace(1)* %in
				%ext = zext <4 x i16> %load to <4 x i64>
				store <4 x i64> %ext, <4 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v4i16_to_v4i64:
				; SI: s_endpgm
				define void @sextload_global_v4i16_to_v4i64(<4 x i64> addrspace(1)* %out, <4 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <4 x i16>, <4 x i16> addrspace(1)* %in
				%ext = sext <4 x i16> %load to <4 x i64>
				store <4 x i64> %ext, <4 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v8i16_to_v8i64:
				; SI: s_endpgm
				define void @zextload_global_v8i16_to_v8i64(<8 x i64> addrspace(1)* %out, <8 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <8 x i16>, <8 x i16> addrspace(1)* %in
				%ext = zext <8 x i16> %load to <8 x i64>
				store <8 x i64> %ext, <8 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v8i16_to_v8i64:
				; SI: s_endpgm
				define void @sextload_global_v8i16_to_v8i64(<8 x i64> addrspace(1)* %out, <8 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <8 x i16>, <8 x i16> addrspace(1)* %in
				%ext = sext <8 x i16> %load to <8 x i64>
				store <8 x i64> %ext, <8 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v16i16_to_v16i64:
				; SI: s_endpgm
				define void @zextload_global_v16i16_to_v16i64(<16 x i64> addrspace(1)* %out, <16 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <16 x i16>, <16 x i16> addrspace(1)* %in
				%ext = zext <16 x i16> %load to <16 x i64>
				store <16 x i64> %ext, <16 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v16i16_to_v16i64:
				; SI: s_endpgm
				define void @sextload_global_v16i16_to_v16i64(<16 x i64> addrspace(1)* %out, <16 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <16 x i16>, <16 x i16> addrspace(1)* %in
				%ext = sext <16 x i16> %load to <16 x i64>
				store <16 x i64> %ext, <16 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v32i16_to_v32i64:
				; SI: s_endpgm
				define void @zextload_global_v32i16_to_v32i64(<32 x i64> addrspace(1)* %out, <32 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <32 x i16>, <32 x i16> addrspace(1)* %in
				%ext = zext <32 x i16> %load to <32 x i64>
				store <32 x i64> %ext, <32 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v32i16_to_v32i64:
				; SI: s_endpgm
				define void @sextload_global_v32i16_to_v32i64(<32 x i64> addrspace(1)* %out, <32 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <32 x i16>, <32 x i16> addrspace(1)* %in
				%ext = sext <32 x i16> %load to <32 x i64>
				store <32 x i64> %ext, <32 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}zextload_global_v64i16_to_v64i64:
				; SI: s_endpgm
				define void @zextload_global_v64i16_to_v64i64(<64 x i64> addrspace(1)* %out, <64 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <64 x i16>, <64 x i16> addrspace(1)* %in
				%ext = zext <64 x i16> %load to <64 x i64>
				store <64 x i64> %ext, <64 x i64> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}sextload_global_v64i16_to_v64i64:
				; SI: s_endpgm
				define void @sextload_global_v64i16_to_v64i64(<64 x i64> addrspace(1)* %out, <64 x i16> addrspace(1)* nocapture %in) nounwind {
				%load = load <64 x i16>, <64 x i16> addrspace(1)* %in
				%ext = sext <64 x i16> %load to <64 x i64>
				store <64 x i64> %ext, <64 x i64> addrspace(1)* %out
				ret void
				}

test/CodeGen/AMDGPU/half.ll

Show First 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	define void @global_extload_v2f16_to_v2f64(<2 x double> addrspace(1)* %out, <2 x half> addrspace(1)* %in) #0 {
%val = load <2 x half>, <2 x half> addrspace(1)* %in		%val = load <2 x half>, <2 x half> addrspace(1)* %in
%cvt = fpext <2 x half> %val to <2 x double>		%cvt = fpext <2 x half> %val to <2 x double>
store <2 x double> %cvt, <2 x double> addrspace(1)* %out		store <2 x double> %cvt, <2 x double> addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}global_extload_v3f16_to_v3f64:		; GCN-LABEL: {{^}}global_extload_v3f16_to_v3f64:

; GCN: buffer_load_dwordx2 [[LOAD:v\[[0-9]+:[0-9]+\]]]		; XSI: buffer_load_dwordx2 [[LOAD:v\[[0-9]+:[0-9]+\]]]
; GCN-DAG: v_cvt_f32_f16_e32		; XSI: v_cvt_f32_f16_e32
; GCN-DAG: v_lshrrev_b32_e32 {{v[0-9]+}}, 16, {{v[0-9]+}}		; XSI: v_cvt_f32_f16_e32
; GCN-DAG: v_cvt_f32_f16_e32		; XSI-DAG: v_lshrrev_b32_e32 {{v[0-9]+}}, 16, {{v[0-9]+}}
; GCN-DAG: v_cvt_f32_f16_e32		; XSI: v_cvt_f32_f16_e32
		; XSI-NOT: v_cvt_f32_f16
; GCN: v_cvt_f64_f32_e32
; GCN: v_cvt_f64_f32_e32		; XVI: buffer_load_dwordx2 [[LOAD:v\[[0-9]+:[0-9]+\]]]
; GCN: v_cvt_f64_f32_e32		; XVI: v_cvt_f32_f16_e32
		; XVI: v_cvt_f32_f16_e32
		; XVI-DAG: v_lshrrev_b32_e32 {{v[0-9]+}}, 16, {{v[0-9]+}}
		; XVI: v_cvt_f32_f16_e32
		; XVI-NOT: v_cvt_f32_f16

		; GCN: buffer_load_dwordx2 v{{\[}}[[IN_LO:[0-9]+]]:[[IN_HI:[0-9]+]]
		; GCN: v_cvt_f32_f16_e32 [[Z32:v[0-9]+]], v[[IN_HI]]
		; GCN: v_cvt_f32_f16_e32 [[X32:v[0-9]+]], v[[IN_LO]]
		; GCN: v_lshrrev_b32_e32 [[Y16:v[0-9]+]], 16, v[[IN_LO]]
		; GCN: v_cvt_f32_f16_e32 [[Y32:v[0-9]+]], [[Y16]]

		; GCN: v_cvt_f64_f32_e32 [[Z:v\[[0-9]+:[0-9]+\]]], [[Z32]]
		; GCN: v_cvt_f64_f32_e32 v{{\[}}[[XLO:[0-9]+]]:{{[0-9]+}}], [[X32]]
		; GCN: v_cvt_f64_f32_e32 v[{{[0-9]+}}:[[YHI:[0-9]+]]{{\]}}, [[Y32]]
; GCN-NOT: v_cvt_f64_f32_e32		; GCN-NOT: v_cvt_f64_f32_e32

; GCN-DAG: buffer_store_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}		; GCN-DAG: buffer_store_dwordx4 v{{\[}}[[XLO]]:[[YHI]]{{\]}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}
; GCN-DAG: buffer_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:16		; GCN-DAG: buffer_store_dwordx2 [[Z]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:16
; GCN: s_endpgm		; GCN: s_endpgm
define void @global_extload_v3f16_to_v3f64(<3 x double> addrspace(1)* %out, <3 x half> addrspace(1)* %in) #0 {		define void @global_extload_v3f16_to_v3f64(<3 x double> addrspace(1)* %out, <3 x half> addrspace(1)* %in) #0 {
%val = load <3 x half>, <3 x half> addrspace(1)* %in		%val = load <3 x half>, <3 x half> addrspace(1)* %in
%cvt = fpext <3 x half> %val to <3 x double>		%cvt = fpext <3 x half> %val to <3 x double>
store <3 x double> %cvt, <3 x double> addrspace(1)* %out		store <3 x double> %cvt, <3 x double> addrspace(1)* %out
ret void		ret void
}		}

▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.AMDGPU.bfe.u32.ll

; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s		; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC -check-prefix=GCN %s
; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s		; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=FUNC -check-prefix=GCN %s
		; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=FUNC -check-prefix=GCN %s
; RUN: llc -march=r600 -mcpu=redwood -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s		; RUN: llc -march=r600 -mcpu=redwood -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

declare i32 @llvm.AMDGPU.bfe.u32(i32, i32, i32) nounwind readnone		declare i32 @llvm.AMDGPU.bfe.u32(i32, i32, i32) nounwind readnone

; FUNC-LABEL: {{^}}bfe_u32_arg_arg_arg:		; FUNC-LABEL: {{^}}bfe_u32_arg_arg_arg:
; SI: v_bfe_u32		; SI: v_bfe_u32
; EG: BFE_UINT		; EG: BFE_UINT
define void @bfe_u32_arg_arg_arg(i32 addrspace(1)* %out, i32 %src0, i32 %src1, i32 %src2) nounwind {		define void @bfe_u32_arg_arg_arg(i32 addrspace(1)* %out, i32 %src0, i32 %src1, i32 %src2) nounwind {
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	define void @bfe_u32_zextload_i8(i32 addrspace(1)* %out, i8 addrspace(1)* %in) nounwind {
%load = load i8, i8 addrspace(1)* %in		%load = load i8, i8 addrspace(1)* %in
%ext = zext i8 %load to i32		%ext = zext i8 %load to i32
%bfe = call i32 @llvm.AMDGPU.bfe.u32(i32 %ext, i32 0, i32 8)		%bfe = call i32 @llvm.AMDGPU.bfe.u32(i32 %ext, i32 0, i32 8)
store i32 %bfe, i32 addrspace(1)* %out, align 4		store i32 %bfe, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}bfe_u32_zext_in_reg_i8:		; FUNC-LABEL: {{^}}bfe_u32_zext_in_reg_i8:
; SI: buffer_load_dword		; GCN: buffer_load_dword
; SI: v_add_i32		; SI: v_add_i32
; SI-NEXT: v_and_b32_e32		; SI-NEXT: v_and_b32_e32
		; FIXME: Should be using s_add_i32
		; VI: v_add_i32
		; VI-NEXT: v_and_b32_e32
; SI-NOT: {{[^@]}}bfe		; SI-NOT: {{[^@]}}bfe
; SI: s_endpgm		; GCN: s_endpgm
define void @bfe_u32_zext_in_reg_i8(i32 addrspace(1)* %out, i32 addrspace(1)* %in) nounwind {		define void @bfe_u32_zext_in_reg_i8(i32 addrspace(1)* %out, i32 addrspace(1)* %in) nounwind {
%load = load i32, i32 addrspace(1)* %in, align 4		%load = load i32, i32 addrspace(1)* %in, align 4
%add = add i32 %load, 1		%add = add i32 %load, 1
%ext = and i32 %add, 255		%ext = and i32 %add, 255
%bfe = call i32 @llvm.AMDGPU.bfe.u32(i32 %ext, i32 0, i32 8)		%bfe = call i32 @llvm.AMDGPU.bfe.u32(i32 %ext, i32 0, i32 8)
store i32 %bfe, i32 addrspace(1)* %out, align 4		store i32 %bfe, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}
▲ Show 20 Lines • Show All 539 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/load-constant-i16.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-NOHSA -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GCN-NOHSA,GCN-NOHSA-SI,FUNC %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-HSA -check-prefix=FUNC %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN -check-prefix=GCN-HSA -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-NOHSA -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GCN-NOHSA,GCN-NOHSA-VI,FUNC %s
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}constant_load_i16:			; FUNC-LABEL: {{^}}constant_load_i16:
	; GCN-NOHSA: buffer_load_ushort v{{[0-9]+}}			; GCN-NOHSA: buffer_load_ushort v{{[0-9]+}}
	; GCN-HSA: flat_load_ushort			; GCN-HSA: flat_load_ushort

	; EG: VTX_READ_16 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 0, #1			; EG: VTX_READ_16 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 0, #1
	define void @constant_load_i16(i16 addrspace(1)* %out, i16 addrspace(2)* %in) {			define void @constant_load_i16(i16 addrspace(1)* %out, i16 addrspace(2)* %in) {
	▲ Show 20 Lines • Show All 411 Lines • ▼ Show 20 Lines
	define void @constant_zextload_i16_to_i64(i64 addrspace(1)* %out, i16 addrspace(2)* %in) #0 {			define void @constant_zextload_i16_to_i64(i64 addrspace(1)* %out, i16 addrspace(2)* %in) #0 {
	%a = load i16, i16 addrspace(2)* %in			%a = load i16, i16 addrspace(2)* %in
	%ext = zext i16 %a to i64			%ext = zext i16 %a to i64
	store i64 %ext, i64 addrspace(1)* %out			store i64 %ext, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}constant_sextload_i16_to_i64:			; FUNC-LABEL: {{^}}constant_sextload_i16_to_i64:
	; GCN-NOHSA-DAG: buffer_load_sshort v[[LO:[0-9]+]],			; FIXME: Need to optimize this sequence to avoid extra bfe:
				; t28: i32,ch = load<LD2[%in(addrspace=1)], anyext from i16> t12, t27, undef:i64
				; t31: i64 = any_extend t28
				; t33: i64 = sign_extend_inreg t31, ValueType:ch:i16

				; GCN-NOHSA-SI-DAG: buffer_load_sshort v[[LO:[0-9]+]],
	; GCN-HSA-DAG: flat_load_sshort v[[LO:[0-9]+]],			; GCN-HSA-DAG: flat_load_sshort v[[LO:[0-9]+]],
				; GCN-NOHSA-VI-DAG: buffer_load_ushort v[[ULO:[0-9]+]],
				; GCN-NOHSA-VI-DAG: v_bfe_i32 v[[LO:[0-9]+]], v[[ULO]], 0, 16
	; GCN-DAG: v_ashrrev_i32_e32 v[[HI:[0-9]+]], 31, v[[LO]]			; GCN-DAG: v_ashrrev_i32_e32 v[[HI:[0-9]+]], 31, v[[LO]]

	; GCN-NOHSA: buffer_store_dwordx2 v{{\[}}[[LO]]:[[HI]]]			; GCN-NOHSA: buffer_store_dwordx2 v{{\[}}[[LO]]:[[HI]]]
	; GCN-HSA: flat_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[}}[[LO]]:[[HI]]{{\]}}			; GCN-HSA: flat_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[}}[[LO]]:[[HI]]{{\]}}

	; EG: VTX_READ_16 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 0, #1			; EG: VTX_READ_16 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 0, #1
	; EG: ASHR {{\*}} {{T[0-9]\.[XYZW]}}, {{.}}, literal			; EG: ASHR {{\*}} {{T[0-9]\.[XYZW]}}, {{.}}, literal
	; TODO: Why not 15 ?			; TODO: Why not 15 ?
	▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/load-global-i16.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-NOHSA -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GCN-NOHSA,GCN-NOHSA-SI,FUNC %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-HSA -check-prefix=FUNC %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GCN-HSA,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-NOHSA -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GCN-NOHSA,GCN-NOHSA-VI,FUNC %s
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=cayman < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=cayman < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	; FIXME: r600 is broken because the bigger testcases spill and it's not implemented			; FIXME: r600 is broken because the bigger testcases spill and it's not implemented

	; FUNC-LABEL: {{^}}global_load_i16:			; FUNC-LABEL: {{^}}global_load_i16:
	; GCN-NOHSA: buffer_load_ushort v{{[0-9]+}}			; GCN-NOHSA: buffer_load_ushort v{{[0-9]+}}
	; GCN-HSA: flat_load_ushort			; GCN-HSA: flat_load_ushort
	▲ Show 20 Lines • Show All 427 Lines • ▼ Show 20 Lines
	define void @global_zextload_i16_to_i64(i64 addrspace(1)* %out, i16 addrspace(1)* %in) #0 {			define void @global_zextload_i16_to_i64(i64 addrspace(1)* %out, i16 addrspace(1)* %in) #0 {
	%a = load i16, i16 addrspace(1)* %in			%a = load i16, i16 addrspace(1)* %in
	%ext = zext i16 %a to i64			%ext = zext i16 %a to i64
	store i64 %ext, i64 addrspace(1)* %out			store i64 %ext, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}global_sextload_i16_to_i64:			; FUNC-LABEL: {{^}}global_sextload_i16_to_i64:
	; GCN-NOHSA-DAG: buffer_load_sshort v[[LO:[0-9]+]],			; FIXME: Need to optimize this sequence to avoid extra bfe:
				; t28: i32,ch = load<LD2[%in(addrspace=1)], anyext from i16> t12, t27, undef:i64
				; t31: i64 = any_extend t28
				; t33: i64 = sign_extend_inreg t31, ValueType:ch:i16

				; GCN-NOHSA-SI-DAG: buffer_load_sshort v[[LO:[0-9]+]],
	; GCN-HSA-DAG: flat_load_sshort v[[LO:[0-9]+]],			; GCN-HSA-DAG: flat_load_sshort v[[LO:[0-9]+]],
				; GCN-NOHSA-VI-DAG: buffer_load_ushort v[[ULO:[0-9]+]],
				; GCN-NOHSA-VI-DAG: v_bfe_i32 v[[LO:[0-9]+]], v[[ULO]], 0, 16
	; GCN-DAG: v_ashrrev_i32_e32 v[[HI:[0-9]+]], 31, v[[LO]]			; GCN-DAG: v_ashrrev_i32_e32 v[[HI:[0-9]+]], 31, v[[LO]]

	; GCN-NOHSA: buffer_store_dwordx2 v{{\[}}[[LO]]:[[HI]]]			; GCN-NOHSA: buffer_store_dwordx2 v{{\[}}[[LO]]:[[HI]]]
	; GCN-HSA: flat_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[}}[[LO]]:[[HI]]{{\]}}			; GCN-HSA: flat_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[}}[[LO]]:[[HI]]{{\]}}

	; EG: VTX_READ_16 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 0, #1			; EG: VTX_READ_16 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 0, #1
	; EG: ASHR {{\*}} {{T[0-9]\.[XYZW]}}, {{.}}, literal			; EG: ASHR {{\*}} {{T[0-9]\.[XYZW]}}, {{.}}, literal
	; TODO: Why not 15 ?			; TODO: Why not 15 ?
	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/load-global-i8.ll

; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-NOHSA -check-prefix=FUNC %s		; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GCN-NOHSA,SI,FUNC %s
; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-HSA -check-prefix=FUNC %s		; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=kaveri -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GCN-HSA,SI,FUNC %s
; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-NOHSA -check-prefix=FUNC %s		; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GCN-NOHSA,VI,FUNC %s
; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s		; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s
; RUN: llc -march=r600 -mcpu=cayman < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s		; RUN: llc -march=r600 -mcpu=cayman < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s


; FUNC-LABEL: {{^}}global_load_i8:		; FUNC-LABEL: {{^}}global_load_i8:
; GCN-NOHSA: buffer_load_ubyte v{{[0-9]+}}		; GCN-NOHSA: buffer_load_ubyte v{{[0-9]+}}
; GCN-HSA: flat_load_ubyte		; GCN-HSA: flat_load_ubyte

▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	define void @global_sextload_v2i8_to_v2i32(<2 x i32> addrspace(1)* %out, <2 x i8> addrspace(1)* %in) #0 {
store <2 x i32> %ext, <2 x i32> addrspace(1)* %out		store <2 x i32> %ext, <2 x i32> addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}global_zextload_v3i8_to_v3i32:		; FUNC-LABEL: {{^}}global_zextload_v3i8_to_v3i32:
; GCN-NOHSA: buffer_load_dword v		; GCN-NOHSA: buffer_load_dword v
; GCN-HSA: flat_load_dword v		; GCN-HSA: flat_load_dword v

; GCN-DAG: v_bfe_u32 v{{[0-9]+}}, v{{[0-9]+}}, 8, 8		; SI-DAG: v_bfe_u32 v{{[0-9]+}}, v{{[0-9]+}}, 8, 8
		; VI-DAG: v_lshrrev_b16_e32 v{{[0-9]+}}, 8, v{{[0-9]+}}
; GCN-DAG: v_bfe_u32 v{{[0-9]+}}, v{{[0-9]+}}, 16, 8		; GCN-DAG: v_bfe_u32 v{{[0-9]+}}, v{{[0-9]+}}, 16, 8
; GCN-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xff,		; GCN-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xff,

; EG: VTX_READ_32 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 0, #1		; EG: VTX_READ_32 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 0, #1
; TODO: These should use DST, but for some there are redundant MOVs		; TODO: These should use DST, but for some there are redundant MOVs
; EG-DAG: BFE_UINT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, {{.*}}literal		; EG-DAG: BFE_UINT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, {{.*}}literal
; EG-DAG: BFE_UINT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, {{.*}}literal		; EG-DAG: BFE_UINT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, {{.*}}literal
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
define void @global_zextload_v3i8_to_v3i32(<3 x i32> addrspace(1)* %out, <3 x i8> addrspace(1)* %in) #0 {		define void @global_zextload_v3i8_to_v3i32(<3 x i32> addrspace(1)* %out, <3 x i8> addrspace(1)* %in) #0 {
entry:		entry:
%ld = load <3 x i8>, <3 x i8> addrspace(1)* %in		%ld = load <3 x i8>, <3 x i8> addrspace(1)* %in
%ext = zext <3 x i8> %ld to <3 x i32>		%ext = zext <3 x i8> %ld to <3 x i32>
store <3 x i32> %ext, <3 x i32> addrspace(1)* %out		store <3 x i32> %ext, <3 x i32> addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}global_sextload_v3i8_to_v3i32:		; FUNC-LABEL: {{^}}global_sextload_v3i8_to_v3i32:
; GCN-NOHSA: buffer_load_dword v		; GCN-NOHSA: buffer_load_dword v
; GCN-HSA: flat_load_dword v		; GCN-HSA: flat_load_dword v

; GCN-DAG: v_bfe_i32 v{{[0-9]+}}, v{{[0-9]+}}, 8, 8		;FIXME: Need to optimize this sequence to avoid extra shift on VI.

		; t23: i16 = truncate t18
		; t49: i16 = srl t23, Constant:i32<8>
		; t57: i32 = any_extend t49
		; t58: i32 = sign_extend_inreg t57, ValueType:ch:i8

		; SI-DAG: v_bfe_i32 v{{[0-9]+}}, v{{[0-9]+}}, 8, 8
		; VI-DAG: v_lshrrev_b16_e32 [[SHIFT:v[0-9]+]], 8, v{{[0-9]+}}
		; VI-DAG: v_bfe_i32 v{{[0-9]+}}, [[SHIFT]], 0, 8
; GCN-DAG: v_bfe_i32 v{{[0-9]+}}, v{{[0-9]+}}, 0, 8		; GCN-DAG: v_bfe_i32 v{{[0-9]+}}, v{{[0-9]+}}, 0, 8
; GCN-DAG: v_bfe_i32 v{{[0-9]+}}, v{{[0-9]+}}, 16, 8		; GCN-DAG: v_bfe_i32 v{{[0-9]+}}, v{{[0-9]+}}, 16, 8

; EG: VTX_READ_32 [[DST:T[0-9]+\.X]], T{{[0-9]+}}.X, 0, #1		; EG: VTX_READ_32 [[DST:T[0-9]+\.X]], T{{[0-9]+}}.X, 0, #1
; TODO: These should use DST, but for some there are redundant MOVs		; TODO: These should use DST, but for some there are redundant MOVs
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal
▲ Show 20 Lines • Show All 773 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/load-local-i16.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,SI,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,FUNC %s
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}local_load_i16:			; FUNC-LABEL: {{^}}local_load_i16:
	; GCN: ds_read_u16 v{{[0-9]+}}			; GCN: ds_read_u16 v{{[0-9]+}}

	; EG: MOV {{[* ]*}}[[FROM:T[0-9]+\.[XYZW]]], KC0[2].Z			; EG: MOV {{[* ]*}}[[FROM:T[0-9]+\.[XYZW]]], KC0[2].Z
	; EG: LDS_USHORT_READ_RET {{.*}} [[FROM]]			; EG: LDS_USHORT_READ_RET {{.*}} [[FROM]]
	; EG-DAG: MOV {{[* ]*}}[[DATA:T[0-9]+\.[XYZW]]], OQAP			; EG-DAG: MOV {{[* ]*}}[[DATA:T[0-9]+\.[XYZW]]], OQAP
	▲ Show 20 Lines • Show All 523 Lines • ▼ Show 20 Lines
	define void @local_zextload_i16_to_i64(i64 addrspace(3)* %out, i16 addrspace(3)* %in) #0 {			define void @local_zextload_i16_to_i64(i64 addrspace(3)* %out, i16 addrspace(3)* %in) #0 {
	%a = load i16, i16 addrspace(3)* %in			%a = load i16, i16 addrspace(3)* %in
	%ext = zext i16 %a to i64			%ext = zext i16 %a to i64
	store i64 %ext, i64 addrspace(3)* %out			store i64 %ext, i64 addrspace(3)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}local_sextload_i16_to_i64:			; FUNC-LABEL: {{^}}local_sextload_i16_to_i64:
	; GCN: ds_read_i16 v[[LO:[0-9]+]],			; FIXME: Need to optimize this sequence to avoid an extra shift.
				; t25: i32,ch = load<LD2[%in(addrspace=3)], anyext from i16> t12, t10, undef:i32
				; t28: i64 = any_extend t25
				; t30: i64 = sign_extend_inreg t28, ValueType:ch:i16
				; SI: ds_read_i16 v[[LO:[0-9]+]],
				; VI: ds_read_u16 v[[ULO:[0-9]+]]
				; VI: v_bfe_i32 v[[LO:[0-9]+]], v[[ULO]], 0, 16
	; GCN-DAG: v_ashrrev_i32_e32 v[[HI:[0-9]+]], 31, v[[LO]]			; GCN-DAG: v_ashrrev_i32_e32 v[[HI:[0-9]+]], 31, v[[LO]]

	; GCN: ds_write_b64 v{{[0-9]+}}, v{{\[}}[[LO]]:[[HI]]]			; GCN: ds_write_b64 v{{[0-9]+}}, v{{\[}}[[LO]]:[[HI]]]

	; EG: MOV {{[* ]*}}[[FROM:T[0-9]+\.[XYZW]]], KC0[2].Z			; EG: MOV {{[* ]*}}[[FROM:T[0-9]+\.[XYZW]]], KC0[2].Z
	; EG: LDS_USHORT_READ_RET {{.*}} [[FROM]]			; EG: LDS_USHORT_READ_RET {{.*}} [[FROM]]
	; EG-DAG: MOV {{[* ]*}}[[TMP:T[0-9]+\.[XYZW]]], OQAP			; EG-DAG: MOV {{[* ]*}}[[TMP:T[0-9]+\.[XYZW]]], OQAP
	; EG-DAG: MOV {{[* ]*}}[[TO:T[0-9]+\.[XYZW]]], KC0[2].Y			; EG-DAG: MOV {{[* ]*}}[[TO:T[0-9]+\.[XYZW]]], KC0[2].Y
	▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/load-local-i8.ll

; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s		; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,SI,FUNC %s
; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=FUNC %s		; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,FUNC %s
; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s		; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s


; FUNC-LABEL: {{^}}local_load_i8:		; FUNC-LABEL: {{^}}local_load_i8:
; GCN-NOT: s_wqm_b64		; GCN-NOT: s_wqm_b64
; GCN: s_mov_b32 m0		; GCN: s_mov_b32 m0
; GCN: ds_read_u8		; GCN: ds_read_u8

▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	define void @local_zextload_v2i8_to_v2i32(<2 x i32> addrspace(3)* %out, <2 x i8> addrspace(3)* %in) #0 {
store <2 x i32> %ext, <2 x i32> addrspace(3)* %out		store <2 x i32> %ext, <2 x i32> addrspace(3)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}local_sextload_v2i8_to_v2i32:		; FUNC-LABEL: {{^}}local_sextload_v2i8_to_v2i32:
; GCN-NOT: s_wqm_b64		; GCN-NOT: s_wqm_b64
; GCN: s_mov_b32 m0		; GCN: s_mov_b32 m0
; GCN: ds_read_u16		; GCN: ds_read_u16
; GCN-DAG: v_bfe_i32 v{{[0-9]+}}, v{{[0-9]+}}, 8, 8		; FIXME: Need to optimize this sequence to avoid extra shift on VI.
; GCN-DAG: v_bfe_i32 v{{[0-9]+}}, v{{[0-9]+}}, 0, 8		; t23: i16 = srl t39, Constant:i32<8>
		; t31: i32 = any_extend t23
		; t33: i32 = sign_extend_inreg t31, ValueType:ch:i8

		; SI-DAG: v_bfe_i32 v{{[0-9]+}}, v{{[0-9]+}}, 8, 8
		; SI-DAG: v_bfe_i32 v{{[0-9]+}}, v{{[0-9]+}}, 0, 8

		; VI-DAG: v_lshrrev_b16_e32 [[SHIFT:v[0-9]+]], 8, v{{[0-9]+}}
		; VI-DAG: v_bfe_i32 v{{[0-9]+}}, v{{[0-9]+}}, 0, 8
		; VI-DAG: v_bfe_i32 v{{[0-9]+}}, [[SHIFT]], 0, 8

; EG: LDS_USHORT_READ_RET		; EG: LDS_USHORT_READ_RET
; EG-DAG: BFE_INT		; EG-DAG: BFE_INT
; EG-DAG: BFE_INT		; EG-DAG: BFE_INT
define void @local_sextload_v2i8_to_v2i32(<2 x i32> addrspace(3)* %out, <2 x i8> addrspace(3)* %in) #0 {		define void @local_sextload_v2i8_to_v2i32(<2 x i32> addrspace(3)* %out, <2 x i8> addrspace(3)* %in) #0 {
%load = load <2 x i8>, <2 x i8> addrspace(3)* %in		%load = load <2 x i8>, <2 x i8> addrspace(3)* %in
%ext = sext <2 x i8> %load to <2 x i32>		%ext = sext <2 x i8> %load to <2 x i32>
store <2 x i32> %ext, <2 x i32> addrspace(3)* %out		store <2 x i32> %ext, <2 x i32> addrspace(3)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}local_zextload_v3i8_to_v3i32:		; FUNC-LABEL: {{^}}local_zextload_v3i8_to_v3i32:
; GCN: ds_read_b32		; GCN: ds_read_b32

; GCN-DAG: v_bfe_u32 v{{[0-9]+}}, v{{[0-9]+}}, 8, 8		; SI-DAG: v_bfe_u32 v{{[0-9]+}}, v{{[0-9]+}}, 8, 8
		; VI-DAG: v_lshrrev_b16_e32 v{{[0-9]+}}, 8, {{v[0-9]+}}
; GCN-DAG: v_bfe_u32 v{{[0-9]+}}, v{{[0-9]+}}, 16, 8		; GCN-DAG: v_bfe_u32 v{{[0-9]+}}, v{{[0-9]+}}, 16, 8
; GCN-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xff,		; GCN-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xff,

; EG: LDS_READ_RET		; EG: LDS_READ_RET
define void @local_zextload_v3i8_to_v3i32(<3 x i32> addrspace(3)* %out, <3 x i8> addrspace(3)* %in) #0 {		define void @local_zextload_v3i8_to_v3i32(<3 x i32> addrspace(3)* %out, <3 x i8> addrspace(3)* %in) #0 {
entry:		entry:
%ld = load <3 x i8>, <3 x i8> addrspace(3)* %in		%ld = load <3 x i8>, <3 x i8> addrspace(3)* %in
%ext = zext <3 x i8> %ld to <3 x i32>		%ext = zext <3 x i8> %ld to <3 x i32>
▲ Show 20 Lines • Show All 756 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/mad_uint24.ll

	; RUN: llc < %s -march=amdgcn -verify-machineinstrs \| FileCheck %s --check-prefix=SI --check-prefix=FUNC
	; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s --check-prefix=SI --check-prefix=FUNC
	; RUN: llc < %s -march=r600 -mcpu=redwood \| FileCheck %s --check-prefix=EG --check-prefix=FUNC			; RUN: llc < %s -march=r600 -mcpu=redwood \| FileCheck %s --check-prefix=EG --check-prefix=FUNC
	; RUN: llc < %s -march=r600 -mcpu=cayman \| FileCheck %s --check-prefix=EG --check-prefix=FUNC			; RUN: llc < %s -march=r600 -mcpu=cayman \| FileCheck %s --check-prefix=EG --check-prefix=FUNC
				; RUN: llc < %s -march=amdgcn -mcpu=SI -verify-machineinstrs \| FileCheck %s --check-prefix=SI --check-prefix=FUNC
				; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s --check-prefix=VI --check-prefix=FUNC
				; RUN: llc < %s -march=amdgcn -mcpu=fiji -verify-machineinstrs \| FileCheck %s --check-prefix=VI --check-prefix=FUNC

				declare i32 @llvm.r600.read.tidig.x() nounwind readnone

	; FUNC-LABEL: {{^}}u32_mad24:			; FUNC-LABEL: {{^}}u32_mad24:
	; EG: MULADD_UINT24			; EG: MULADD_UINT24
	; SI: v_mad_u32_u24			; SI: v_mad_u32_u24
				; VI: v_mad_u32_u24

	define void @u32_mad24(i32 addrspace(1)* %out, i32 %a, i32 %b, i32 %c) {			define void @u32_mad24(i32 addrspace(1)* %out, i32 %a, i32 %b, i32 %c) {
	entry:			entry:
	%0 = shl i32 %a, 8			%0 = shl i32 %a, 8
	%a_24 = lshr i32 %0, 8			%a_24 = lshr i32 %0, 8
	%1 = shl i32 %b, 8			%1 = shl i32 %b, 8
	%b_24 = lshr i32 %1, 8			%b_24 = lshr i32 %1, 8
	%2 = mul i32 %a_24, %b_24			%2 = mul i32 %a_24, %b_24
	%3 = add i32 %2, %c			%3 = add i32 %2, %c
	store i32 %3, i32 addrspace(1)* %out			store i32 %3, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}i16_mad24:			; FUNC-LABEL: {{^}}i16_mad24:
	; The order of A and B does not matter.			; The order of A and B does not matter.
	; EG: MULADD_UINT24 {{[* ]*}}T{{[0-9]}}.[[MAD_CHAN:[XYZW]]]			; EG: MULADD_UINT24 {{[* ]*}}T{{[0-9]}}.[[MAD_CHAN:[XYZW]]]
	; The result must be sign-extended			; The result must be sign-extended
	; EG: BFE_INT {{[* ]*}}T{{[0-9]\.[XYZW]}}, PV.[[MAD_CHAN]], 0.0, literal.x			; EG: BFE_INT {{[* ]*}}T{{[0-9]\.[XYZW]}}, PV.[[MAD_CHAN]], 0.0, literal.x
	; EG: 16			; EG: 16
	; SI: v_mad_u32_u24 [[MAD:v[0-9]]], {{[sv][0-9], [sv][0-9]}}			; FIXME: Should be using scalar instructions here.
	; SI: v_bfe_i32 v{{[0-9]}}, [[MAD]], 0, 16			; GCN: v_mad_u32_u24 [[MAD:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
				; GCN: v_bfe_i32 v{{[0-9]}}, [[MAD]], 0, 16
	define void @i16_mad24(i32 addrspace(1)* %out, i16 %a, i16 %b, i16 %c) {			define void @i16_mad24(i32 addrspace(1)* %out, i16 %a, i16 %b, i16 %c) {
	entry:			entry:
	%0 = mul i16 %a, %b			%0 = mul i16 %a, %b
	%1 = add i16 %0, %c			%1 = add i16 %0, %c
	%2 = sext i16 %1 to i32			%2 = sext i16 %1 to i32
	store i32 %2, i32 addrspace(1)* %out			store i32 %2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
	; FUNC-LABEL: {{^}}i8_mad24:			; FUNC-LABEL: {{^}}i8_mad24:
	; EG: MULADD_UINT24 {{[* ]*}}T{{[0-9]}}.[[MAD_CHAN:[XYZW]]]			; EG: MULADD_UINT24 {{[* ]*}}T{{[0-9]}}.[[MAD_CHAN:[XYZW]]]
	; The result must be sign-extended			; The result must be sign-extended
	; EG: BFE_INT {{[* ]*}}T{{[0-9]\.[XYZW]}}, PV.[[MAD_CHAN]], 0.0, literal.x			; EG: BFE_INT {{[* ]*}}T{{[0-9]\.[XYZW]}}, PV.[[MAD_CHAN]], 0.0, literal.x
	; EG: 8			; EG: 8
	; SI: v_mad_u32_u24 [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}			; GCN: v_mad_u32_u24 [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
	; SI: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 8			; GCN: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 8

	define void @i8_mad24(i32 addrspace(1)* %out, i8 %a, i8 %b, i8 %c) {			define void @i8_mad24(i32 addrspace(1)* %out, i8 %a, i8 %b, i8 %c) {
	entry:			entry:
	%0 = mul i8 %a, %b			%0 = mul i8 %a, %b
	%1 = add i8 %0, %c			%1 = add i8 %0, %c
	%2 = sext i8 %1 to i32			%2 = sext i8 %1 to i32
	store i32 %2, i32 addrspace(1)* %out			store i32 %2, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/max.i16.ll

This file was added.

				; RUN: llc < %s -march=amdgcn -mcpu=fiji -verify-machineinstrs \| FileCheck -check-prefix=GCN -check-prefix=VI %s


				declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_imax_sge_i16:
				; VI: v_max_i16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				define void @v_test_imax_sge_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %aptr, i16 addrspace(1)* %bptr) nounwind {
				%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
				%gep0 = getelementptr i16, i16 addrspace(1)* %aptr, i32 %tid
				%gep1 = getelementptr i16, i16 addrspace(1)* %bptr, i32 %tid
				%outgep = getelementptr i16, i16 addrspace(1)* %out, i32 %tid
				%a = load i16, i16 addrspace(1)* %gep0, align 4
				%b = load i16, i16 addrspace(1)* %gep1, align 4
				%cmp = icmp sge i16 %a, %b
				%val = select i1 %cmp, i16 %a, i16 %b
				store i16 %val, i16 addrspace(1)* %outgep, align 4
				ret void
				}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_imax_sge_v4i16:
				; VI: v_max_i16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				; VI: v_max_i16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				; VI: v_max_i16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				; VI: v_max_i16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				define void @v_test_imax_sge_v4i16(<4 x i16> addrspace(1)* %out, <4 x i16> addrspace(1)* %aptr, <4 x i16> addrspace(1)* %bptr) nounwind {
				%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
				%gep0 = getelementptr <4 x i16>, <4 x i16> addrspace(1)* %aptr, i32 %tid
				%gep1 = getelementptr <4 x i16>, <4 x i16> addrspace(1)* %bptr, i32 %tid
				%outgep = getelementptr <4 x i16>, <4 x i16> addrspace(1)* %out, i32 %tid
				%a = load <4 x i16>, <4 x i16> addrspace(1)* %gep0, align 4
				%b = load <4 x i16>, <4 x i16> addrspace(1)* %gep1, align 4
				%cmp = icmp sge <4 x i16> %a, %b
				%val = select <4 x i1> %cmp, <4 x i16> %a, <4 x i16> %b
				store <4 x i16> %val, <4 x i16> addrspace(1)* %outgep, align 4
				ret void
				}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_imax_sgt_i16:
				; VI: v_max_i16_e32
				define void @v_test_imax_sgt_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %aptr, i16 addrspace(1)* %bptr) nounwind {
				%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
				%gep0 = getelementptr i16, i16 addrspace(1)* %aptr, i32 %tid
				%gep1 = getelementptr i16, i16 addrspace(1)* %bptr, i32 %tid
				%outgep = getelementptr i16, i16 addrspace(1)* %out, i32 %tid
				%a = load i16, i16 addrspace(1)* %gep0, align 4
				%b = load i16, i16 addrspace(1)* %gep1, align 4
				%cmp = icmp sgt i16 %a, %b
				%val = select i1 %cmp, i16 %a, i16 %b
				store i16 %val, i16 addrspace(1)* %outgep, align 4
				ret void
				}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_umax_uge_i16:
				; VI: v_max_u16_e32
				define void @v_test_umax_uge_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %aptr, i16 addrspace(1)* %bptr) nounwind {
				%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
				%gep0 = getelementptr i16, i16 addrspace(1)* %aptr, i32 %tid
				%gep1 = getelementptr i16, i16 addrspace(1)* %bptr, i32 %tid
				%outgep = getelementptr i16, i16 addrspace(1)* %out, i32 %tid
				%a = load i16, i16 addrspace(1)* %gep0, align 4
				%b = load i16, i16 addrspace(1)* %gep1, align 4
				%cmp = icmp uge i16 %a, %b
				%val = select i1 %cmp, i16 %a, i16 %b
				store i16 %val, i16 addrspace(1)* %outgep, align 4
				ret void
				}

				; FIXME: Need to handle non-uniform case for function below (load without gep).
				; GCN-LABEL: {{^}}v_test_umax_ugt_i16:
				; VI: v_max_u16_e32
				define void @v_test_umax_ugt_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %aptr, i16 addrspace(1)* %bptr) nounwind {
				%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
				%gep0 = getelementptr i16, i16 addrspace(1)* %aptr, i32 %tid
				%gep1 = getelementptr i16, i16 addrspace(1)* %bptr, i32 %tid
				%outgep = getelementptr i16, i16 addrspace(1)* %out, i32 %tid
				%a = load i16, i16 addrspace(1)* %gep0, align 4
				%b = load i16, i16 addrspace(1)* %gep1, align 4
				%cmp = icmp ugt i16 %a, %b
				%val = select i1 %cmp, i16 %a, i16 %b
				store i16 %val, i16 addrspace(1)* %outgep, align 4
				ret void
				}

test/CodeGen/AMDGPU/mul_uint24-amdgcn.ll

	Show All 25 Lines
	entry:			entry:
	%mul = mul i16 %a, %b			%mul = mul i16 %a, %b
	%ext = sext i16 %mul to i32			%ext = sext i16 %mul to i32
	store i32 %ext, i32 addrspace(1)* %out			store i32 %ext, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_umul24_i16_vgpr_sext:			; FUNC-LABEL: {{^}}test_umul24_i16_vgpr_sext:
	; GCN: v_mul_u32_u24_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}			; SI: v_mul_u32_u24_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
				; VI: v_mul_lo_u16_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
	; GCN: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 16			; GCN: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 16
	define void @test_umul24_i16_vgpr_sext(i32 addrspace(1)* %out, i16 addrspace(1)* %in) {			define void @test_umul24_i16_vgpr_sext(i32 addrspace(1)* %out, i16 addrspace(1)* %in) {
	%tid.x = call i32 @llvm.amdgcn.workitem.id.x()			%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.y = call i32 @llvm.amdgcn.workitem.id.y()			%tid.y = call i32 @llvm.amdgcn.workitem.id.y()
	%ptr_a = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.x			%ptr_a = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.x
	%ptr_b = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.y			%ptr_b = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.y
	%a = load i16, i16 addrspace(1)* %ptr_a			%a = load i16, i16 addrspace(1)* %ptr_a
	%b = load i16, i16 addrspace(1)* %ptr_b			%b = load i16, i16 addrspace(1)* %ptr_b
	Show All 14 Lines
	entry:			entry:
	%mul = mul i16 %a, %b			%mul = mul i16 %a, %b
	%ext = zext i16 %mul to i32			%ext = zext i16 %mul to i32
	store i32 %ext, i32 addrspace(1)* %out			store i32 %ext, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_umul24_i16_vgpr:			; FUNC-LABEL: {{^}}test_umul24_i16_vgpr:
	; GCN: v_mul_u32_u24_e32			; SI: v_mul_u32_u24_e32
	; GCN: v_and_b32_e32			; SI: v_and_b32_e32
				; VI: v_mul_lo_u16
	define void @test_umul24_i16_vgpr(i32 addrspace(1)* %out, i16 addrspace(1)* %in) {			define void @test_umul24_i16_vgpr(i32 addrspace(1)* %out, i16 addrspace(1)* %in) {
	%tid.x = call i32 @llvm.amdgcn.workitem.id.x()			%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.y = call i32 @llvm.amdgcn.workitem.id.y()			%tid.y = call i32 @llvm.amdgcn.workitem.id.y()
	%ptr_a = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.x			%ptr_a = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.x
	%ptr_b = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.y			%ptr_b = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.y
	%a = load i16, i16 addrspace(1)* %ptr_a			%a = load i16, i16 addrspace(1)* %ptr_a
	%b = load i16, i16 addrspace(1)* %ptr_b			%b = load i16, i16 addrspace(1)* %ptr_b
	%mul = mul i16 %a, %b			%mul = mul i16 %a, %b
	%val = zext i16 %mul to i32			%val = zext i16 %mul to i32
	store i32 %val, i32 addrspace(1)* %out			store i32 %val, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FIXME: Need to handle non-uniform case for function below (load without gep).
	; FUNC-LABEL: {{^}}test_umul24_i8_vgpr:			; FUNC-LABEL: {{^}}test_umul24_i8_vgpr:
	; GCN: v_mul_u32_u24_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}			; SI: v_mul_u32_u24_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
				; VI: v_mul_lo_u16_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
	; GCN: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 8			; GCN: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 8
	define void @test_umul24_i8_vgpr(i32 addrspace(1)* %out, i8 addrspace(1)* %a, i8 addrspace(1)* %b) {			define void @test_umul24_i8_vgpr(i32 addrspace(1)* %out, i8 addrspace(1)* %a, i8 addrspace(1)* %b) {
	entry:			entry:
	%tid.x = call i32 @llvm.amdgcn.workitem.id.x()			%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.y = call i32 @llvm.amdgcn.workitem.id.y()			%tid.y = call i32 @llvm.amdgcn.workitem.id.y()
	%a.ptr = getelementptr i8, i8 addrspace(1)* %a, i32 %tid.x			%a.ptr = getelementptr i8, i8 addrspace(1)* %a, i32 %tid.x
	%b.ptr = getelementptr i8, i8 addrspace(1)* %b, i32 %tid.y			%b.ptr = getelementptr i8, i8 addrspace(1)* %b, i32 %tid.y
	%a.l = load i8, i8 addrspace(1)* %a.ptr			%a.l = load i8, i8 addrspace(1)* %a.ptr
	▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/shl.ll

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	define void @shl_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
%b_ptr = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %in, i32 1		%b_ptr = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %in, i32 1
%a = load <4 x i32>, <4 x i32> addrspace(1) * %in		%a = load <4 x i32>, <4 x i32> addrspace(1) * %in
%b = load <4 x i32>, <4 x i32> addrspace(1) * %b_ptr		%b = load <4 x i32>, <4 x i32> addrspace(1) * %b_ptr
%result = shl <4 x i32> %a, %b		%result = shl <4 x i32> %a, %b
store <4 x i32> %result, <4 x i32> addrspace(1)* %out		store <4 x i32> %result, <4 x i32> addrspace(1)* %out
ret void		ret void
}		}

		;VI: {{^}}shl_i16:
		;VI: v_lshlrev_b16_e32 v{{[0-9]+, [0-9]+, [0-9]+}}

		define void @shl_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %in) {
		%b_ptr = getelementptr i16, i16 addrspace(1)* %in, i16 1
		%a = load i16, i16 addrspace(1) * %in
		%b = load i16, i16 addrspace(1) * %b_ptr
		%result = shl i16 %a, %b
		store i16 %result, i16 addrspace(1)* %out
		ret void
		}


		;VI: {{^}}shl_v2i16:
		;VI: v_lshlrev_b16_e32 v{{[0-9]+, [0-9]+, [0-9]+}}
		;VI: v_lshlrev_b16_e32 v{{[0-9]+, [0-9]+, [0-9]+}}

		define void @shl_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) {
		%b_ptr = getelementptr <2 x i16>, <2 x i16> addrspace(1)* %in, i16 1
		%a = load <2 x i16>, <2 x i16> addrspace(1) * %in
		%b = load <2 x i16>, <2 x i16> addrspace(1) * %b_ptr
		%result = shl <2 x i16> %a, %b
		store <2 x i16> %result, <2 x i16> addrspace(1)* %out
		ret void
		}


		;VI: {{^}}shl_v4i16:
		;VI: v_lshlrev_b16_e32 v{{[0-9]+, [0-9]+, [0-9]+}}
		;VI: v_lshlrev_b16_e32 v{{[0-9]+, [0-9]+, [0-9]+}}
		;VI: v_lshlrev_b16_e32 v{{[0-9]+, [0-9]+, [0-9]+}}
		;VI: v_lshlrev_b16_e32 v{{[0-9]+, [0-9]+, [0-9]+}}

		define void @shl_v4i16(<4 x i16> addrspace(1)* %out, <4 x i16> addrspace(1)* %in) {
		%b_ptr = getelementptr <4 x i16>, <4 x i16> addrspace(1)* %in, i16 1
		%a = load <4 x i16>, <4 x i16> addrspace(1) * %in
		%b = load <4 x i16>, <4 x i16> addrspace(1) * %b_ptr
		%result = shl <4 x i16> %a, %b
		store <4 x i16> %result, <4 x i16> addrspace(1)* %out
		ret void
		}

;EG-LABEL: {{^}}shl_i64:		;EG-LABEL: {{^}}shl_i64:
;EG: SUB_INT {{\? }}[[COMPSH:T[0-9]+\.[XYZW]]], {{literal.[xy]}}, [[SHIFT:T[0-9]+\.[XYZW]]]		;EG: SUB_INT {{\? }}[[COMPSH:T[0-9]+\.[XYZW]]], {{literal.[xy]}}, [[SHIFT:T[0-9]+\.[XYZW]]]
;EG: LSHR {{\* *}}[[TEMP:T[0-9]+\.[XYZW]]], [[OPLO:T[0-9]+\.[XYZW]]], {{[[COMPSH]]\|PV.[XYZW]}}		;EG: LSHR {{\* *}}[[TEMP:T[0-9]+\.[XYZW]]], [[OPLO:T[0-9]+\.[XYZW]]], {{[[COMPSH]]\|PV.[XYZW]}}
;EG-DAG: ADD_INT {{\? }}[[BIGSH:T[0-9]+\.[XYZW]]], [[SHIFT]], literal		;EG-DAG: ADD_INT {{\? }}[[BIGSH:T[0-9]+\.[XYZW]]], [[SHIFT]], literal
;EG-DAG: LSHR {{\? }}[[OVERF:T[0-9]+\.[XYZW]]], {{[[TEMP]]\|PV.[XYZW]}}, 1		;EG-DAG: LSHR {{\? }}[[OVERF:T[0-9]+\.[XYZW]]], {{[[TEMP]]\|PV.[XYZW]}}, 1
;EG-DAG: LSHL {{\? }}[[HISMTMP:T[0-9]+\.[XYZW]]], [[OPHI:T[0-9]+\.[XYZW]]], [[SHIFT]]		;EG-DAG: LSHL {{\? }}[[HISMTMP:T[0-9]+\.[XYZW]]], [[OPHI:T[0-9]+\.[XYZW]]], [[SHIFT]]
;EG-DAG: OR_INT {{\? }}[[HISM:T[0-9]+\.[XYZW]]], {{[[HISMTMP]]\|PV.[XYZW]\|PS}}, {{[[OVERF]]\|PV.[XYZW]}}		;EG-DAG: OR_INT {{\? }}[[HISM:T[0-9]+\.[XYZW]]], {{[[HISMTMP]]\|PV.[XYZW]\|PS}}, {{[[OVERF]]\|PV.[XYZW]}}
;EG-DAG: LSHL {{\? }}[[LOSM:T[0-9]+\.[XYZW]]], [[OPLO]], {{PS\|[[SHIFT]]\|PV.[XYZW]}}		;EG-DAG: LSHL {{\? }}[[LOSM:T[0-9]+\.[XYZW]]], [[OPLO]], {{PS\|[[SHIFT]]\|PV.[XYZW]}}
▲ Show 20 Lines • Show All 317 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/sign_extend.ll

; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI %s		; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,SI %s
; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VI %s		; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI %s

; GCN-LABEL: {{^}}s_sext_i1_to_i32:		; GCN-LABEL: {{^}}s_sext_i1_to_i32:
; GCN: v_cndmask_b32_e64		; GCN: v_cndmask_b32_e64
; GCN: s_endpgm		; GCN: s_endpgm
define void @s_sext_i1_to_i32(i32 addrspace(1)* %out, i32 %a, i32 %b) nounwind {		define void @s_sext_i1_to_i32(i32 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
%cmp = icmp eq i32 %a, %b		%cmp = icmp eq i32 %a, %b
%sext = sext i1 %cmp to i32		%sext = sext i1 %cmp to i32
store i32 %sext, i32 addrspace(1)* %out, align 4		store i32 %sext, i32 addrspace(1)* %out, align 4
Show All 39 Lines
define void @v_sext_i32_to_i64(i64 addrspace(1)* %out, i32 addrspace(1)* %in) nounwind {		define void @v_sext_i32_to_i64(i64 addrspace(1)* %out, i32 addrspace(1)* %in) nounwind {
%val = load i32, i32 addrspace(1)* %in, align 4		%val = load i32, i32 addrspace(1)* %in, align 4
%sext = sext i32 %val to i64		%sext = sext i32 %val to i64
store i64 %sext, i64 addrspace(1)* %out, align 8		store i64 %sext, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; GCN-LABEL: {{^}}s_sext_i16_to_i64:		; GCN-LABEL: {{^}}s_sext_i16_to_i64:
; GCN: s_endpgm		; GCN: s_bfe_i64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0x100000
define void @s_sext_i16_to_i64(i64 addrspace(1)* %out, i16 %a) nounwind {		define void @s_sext_i16_to_i64(i64 addrspace(1)* %out, i16 %a) nounwind {
%sext = sext i16 %a to i64		%sext = sext i16 %a to i64
store i64 %sext, i64 addrspace(1)* %out, align 8		store i64 %sext, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

		; GCN-LABEL: {{^}}s_sext_i1_to_i16:
		; GCN: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, -1
		; GCN-NEXT: buffer_store_short [[RESULT]]
		define void @s_sext_i1_to_i16(i16 addrspace(1)* %out, i32 %a, i32 %b) nounwind {
		%cmp = icmp eq i32 %a, %b
		%sext = sext i1 %cmp to i16
		store i16 %sext, i16 addrspace(1)* %out
		ret void
		}

; GCN-LABEL: {{^}}s_sext_v4i8_to_v4i32:		; GCN-LABEL: {{^}}s_sext_v4i8_to_v4i32:
; GCN: s_load_dword [[VAL:s[0-9]+]]		; GCN: s_load_dword [[VAL:s[0-9]+]]
; GCN-DAG: s_sext_i32_i8 [[EXT0:s[0-9]+]], [[VAL]]
; GCN-DAG: s_bfe_i32 [[EXT1:s[0-9]+]], [[VAL]], 0x80008
; GCN-DAG: s_bfe_i32 [[EXT2:s[0-9]+]], [[VAL]], 0x80010		; GCN-DAG: s_bfe_i32 [[EXT2:s[0-9]+]], [[VAL]], 0x80010
; GCN-DAG: s_ashr_i32 [[EXT3:s[0-9]+]], [[VAL]], 24		; GCN-DAG: s_ashr_i32 [[EXT3:s[0-9]+]], [[VAL]], 24
		; SI-DAG: s_bfe_i32 [[EXT1:s[0-9]+]], [[VAL]], 0x80008
		; GCN-DAG: s_sext_i32_i8 [[EXT0:s[0-9]+]], [[VAL]]

		; FIXME: We end up with a v_bfe instruction, because the i16 srl
		; gets selected to a v_lshrrev_b16 instructions, so the input to
		; the bfe is a vector registers. To fix this we need to be able to
		; optimize:
		; t29: i16 = truncate t10
		; t55: i16 = srl t29, Constant:i32<8>
		; t63: i32 = any_extend t55
		; t64: i32 = sign_extend_inreg t63, ValueType:ch:i8

		; VI-DAG: v_bfe_i32 [[VEXT1:v[0-9]+]], v{{[0-9]+}}, 0, 8

; GCN-DAG: v_mov_b32_e32 [[VEXT0:v[0-9]+]], [[EXT0]]		; GCN-DAG: v_mov_b32_e32 [[VEXT0:v[0-9]+]], [[EXT0]]
; GCN-DAG: v_mov_b32_e32 [[VEXT1:v[0-9]+]], [[EXT1]]		; SI-DAG: v_mov_b32_e32 [[VEXT1:v[0-9]+]], [[EXT1]]
; GCN-DAG: v_mov_b32_e32 [[VEXT2:v[0-9]+]], [[EXT2]]		; GCN-DAG: v_mov_b32_e32 [[VEXT2:v[0-9]+]], [[EXT2]]
; GCN-DAG: v_mov_b32_e32 [[VEXT3:v[0-9]+]], [[EXT3]]		; GCN-DAG: v_mov_b32_e32 [[VEXT3:v[0-9]+]], [[EXT3]]

; GCN-DAG: buffer_store_dword [[VEXT0]]		; GCN-DAG: buffer_store_dword [[VEXT0]]
; GCN-DAG: buffer_store_dword [[VEXT1]]		; GCN-DAG: buffer_store_dword [[VEXT1]]
; GCN-DAG: buffer_store_dword [[VEXT2]]		; GCN-DAG: buffer_store_dword [[VEXT2]]
; GCN-DAG: buffer_store_dword [[VEXT3]]		; GCN-DAG: buffer_store_dword [[VEXT3]]

Show All 9 Lines	define void @s_sext_v4i8_to_v4i32(i32 addrspace(1)* %out, i32 %a) nounwind {
store volatile i32 %elt1, i32 addrspace(1)* %out		store volatile i32 %elt1, i32 addrspace(1)* %out
store volatile i32 %elt2, i32 addrspace(1)* %out		store volatile i32 %elt2, i32 addrspace(1)* %out
store volatile i32 %elt3, i32 addrspace(1)* %out		store volatile i32 %elt3, i32 addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_sext_v4i8_to_v4i32:		; GCN-LABEL: {{^}}v_sext_v4i8_to_v4i32:
; GCN: buffer_load_dword [[VAL:v[0-9]+]]		; GCN: buffer_load_dword [[VAL:v[0-9]+]]
; GCN-DAG: v_bfe_i32 [[EXT0:v[0-9]+]], [[VAL]], 0, 8		; FIXME: need to optimize same sequence as above test to avoid
; GCN-DAG: v_bfe_i32 [[EXT1:v[0-9]+]], [[VAL]], 8, 8		; this shift.
; GCN-DAG: v_bfe_i32 [[EXT2:v[0-9]+]], [[VAL]], 16, 8		; VI-DAG: v_lshrrev_b16_e32 [[SH16:v[0-9]+]], 8, [[VAL]]
; GCN-DAG: v_ashrrev_i32_e32 [[EXT3:v[0-9]+]], 24, [[VAL]]		; GCN-DAG: v_ashrrev_i32_e32 [[EXT3:v[0-9]+]], 24, [[VAL]]
		; VI-DAG: v_bfe_i32 [[EXT0:v[0-9]+]], [[VAL]], 0, 8
		; VI-DAG: v_bfe_i32 [[EXT2:v[0-9]+]], [[VAL]], 16, 8
		; VI-DAG: v_bfe_i32 [[EXT1:v[0-9]+]], [[SH16]], 0, 8

		; SI-DAG: v_bfe_i32 [[EXT2:v[0-9]+]], [[VAL]], 16, 8
		; SI-DAG: v_bfe_i32 [[EXT1:v[0-9]+]], [[VAL]], 8, 8
		; SI: v_bfe_i32 [[EXT0:v[0-9]+]], [[VAL]], 0, 8

; GCN: buffer_store_dword [[EXT0]]		; GCN: buffer_store_dword [[EXT0]]
; GCN: buffer_store_dword [[EXT1]]		; GCN: buffer_store_dword [[EXT1]]
; GCN: buffer_store_dword [[EXT2]]		; GCN: buffer_store_dword [[EXT2]]
; GCN: buffer_store_dword [[EXT3]]		; GCN: buffer_store_dword [[EXT3]]
define void @v_sext_v4i8_to_v4i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) nounwind {		define void @v_sext_v4i8_to_v4i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) nounwind {
%a = load i32, i32 addrspace(1)* %in		%a = load i32, i32 addrspace(1)* %in
%cast = bitcast i32 %a to <4 x i8>		%cast = bitcast i32 %a to <4 x i8>
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/sra.ll

Show All 40 Lines	define void @ashr_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
%b_ptr = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %in, i32 1		%b_ptr = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %in, i32 1
%a = load <4 x i32>, <4 x i32> addrspace(1)* %in		%a = load <4 x i32>, <4 x i32> addrspace(1)* %in
%b = load <4 x i32>, <4 x i32> addrspace(1)* %b_ptr		%b = load <4 x i32>, <4 x i32> addrspace(1)* %b_ptr
%result = ashr <4 x i32> %a, %b		%result = ashr <4 x i32> %a, %b
store <4 x i32> %result, <4 x i32> addrspace(1)* %out		store <4 x i32> %result, <4 x i32> addrspace(1)* %out
ret void		ret void
}		}

		; FUNC-LABEL: {{^}}ashr_v2i16:
		; FIXME: The ashr operation is uniform, but because its operands come from a
		; global load we end up with the vector instructions rather than scalar.
		; VI: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
		; VI: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
		define void @ashr_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) {
		%b_ptr = getelementptr <2 x i16>, <2 x i16> addrspace(1)* %in, i16 1
		%a = load <2 x i16>, <2 x i16> addrspace(1)* %in
		%b = load <2 x i16>, <2 x i16> addrspace(1)* %b_ptr
		%result = ashr <2 x i16> %a, %b
		store <2 x i16> %result, <2 x i16> addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}ashr_v4i16:
		; FIXME: The ashr operation is uniform, but because its operands come from a
		; global load we end up with the vector instructions rather than scalar.
		; VI: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
		; VI: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
		; VI: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
		; VI: v_ashrrev_i32_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
		define void @ashr_v4i16(<4 x i16> addrspace(1)* %out, <4 x i16> addrspace(1)* %in) {
		%b_ptr = getelementptr <4 x i16>, <4 x i16> addrspace(1)* %in, i16 1
		%a = load <4 x i16>, <4 x i16> addrspace(1)* %in
		%b = load <4 x i16>, <4 x i16> addrspace(1)* %b_ptr
		%result = ashr <4 x i16> %a, %b
		store <4 x i16> %result, <4 x i16> addrspace(1)* %out
		ret void
		}

; FUNC-LABEL: {{^}}s_ashr_i64:		; FUNC-LABEL: {{^}}s_ashr_i64:
; GCN: s_ashr_i64 s[{{[0-9]}}:{{[0-9]}}], s[{{[0-9]}}:{{[0-9]}}], 8		; GCN: s_ashr_i64 s[{{[0-9]}}:{{[0-9]}}], s[{{[0-9]}}:{{[0-9]}}], 8

; EG: ASHR		; EG: ASHR
define void @s_ashr_i64(i64 addrspace(1)* %out, i32 %in) {		define void @s_ashr_i64(i64 addrspace(1)* %out, i32 %in) {
entry:		entry:
%in.ext = sext i32 %in to i64		%in.ext = sext i32 %in to i64
%ashr = ashr i64 %in.ext, 8		%ashr = ashr i64 %in.ext, 8
▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/sub.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	define void @test_sub_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
%b_ptr = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %in, i32 1		%b_ptr = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %in, i32 1
%a = load <4 x i32>, <4 x i32> addrspace(1) * %in		%a = load <4 x i32>, <4 x i32> addrspace(1) * %in
%b = load <4 x i32>, <4 x i32> addrspace(1) * %b_ptr		%b = load <4 x i32>, <4 x i32> addrspace(1) * %b_ptr
%result = sub <4 x i32> %a, %b		%result = sub <4 x i32> %a, %b
store <4 x i32> %result, <4 x i32> addrspace(1)* %out		store <4 x i32> %result, <4 x i32> addrspace(1)* %out
ret void		ret void
}		}

		; VI: v_sub_i16_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
		define void @test_sub_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %in) {
		%b_ptr = getelementptr i16, i16 addrspace(1)* %in, i16 1
		%a = load i16, i16 addrspace(1)* %in
		%b = load i16, i16 addrspace(1)* %b_ptr
		%result = sub i16 %a, %b
		store i16 %result, i16 addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_sub_v2i16:

		; VI: v_sub_i16_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
		; VI: v_sub_i16_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}

		define void @test_sub_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) {
		%b_ptr = getelementptr <2 x i16>, <2 x i16> addrspace(1)* %in, i16 1
		%a = load <2 x i16>, <2 x i16> addrspace(1) * %in
		%b = load <2 x i16>, <2 x i16> addrspace(1) * %b_ptr
		%result = sub <2 x i16> %a, %b
		store <2 x i16> %result, <2 x i16> addrspace(1)* %out
		ret void
		}

		; FUNC-LABEL: {{^}}test_sub_v4i16:

		; VI: v_sub_i16_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
		; VI: v_sub_i16_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
		; VI: v_sub_i16_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
		; VI: v_sub_i16_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}

		define void @test_sub_v4i16(<4 x i16> addrspace(1)* %out, <4 x i16> addrspace(1)* %in) {
		%b_ptr = getelementptr <4 x i16>, <4 x i16> addrspace(1)* %in, i16 1
		%a = load <4 x i16>, <4 x i16> addrspace(1) * %in
		%b = load <4 x i16>, <4 x i16> addrspace(1) * %b_ptr
		%result = sub <4 x i16> %a, %b
		store <4 x i16> %result, <4 x i16> addrspace(1)* %out
		ret void
		}

; FUNC-LABEL: {{^}}s_sub_i64:		; FUNC-LABEL: {{^}}s_sub_i64:
; SI: s_sub_u32		; SI: s_sub_u32
; SI: s_subb_u32		; SI: s_subb_u32

; EG: MEM_RAT_CACHELESS STORE_RAW T{{[0-9]+}}.XY		; EG: MEM_RAT_CACHELESS STORE_RAW T{{[0-9]+}}.XY
; EG-DAG: SUB_INT {{[* ]*}}		; EG-DAG: SUB_INT {{[* ]*}}
; EG-DAG: SUBB_UINT		; EG-DAG: SUBB_UINT
; EG-DAG: SUB_INT		; EG-DAG: SUB_INT
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/trunc-bitcast-vector.ll

; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck %s		; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck --check-prefix=SI %s
; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck %s		; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck --check-prefix=VI %s

; CHECK-LABEL: {{^}}trunc_i64_bitcast_v2i32:		; CHECK-LABEL: {{^}}trunc_i64_bitcast_v2i32:
; CHECK: buffer_load_dword v		; CHECK: buffer_load_dword v
; CHECK: buffer_store_dword v		; CHECK: buffer_store_dword v
define void @trunc_i64_bitcast_v2i32(i32 addrspace(1)* %out, <2 x i32> addrspace(1)* %in) {		define void @trunc_i64_bitcast_v2i32(i32 addrspace(1)* %out, <2 x i32> addrspace(1)* %in) {
%ld = load <2 x i32>, <2 x i32> addrspace(1)* %in		%ld = load <2 x i32>, <2 x i32> addrspace(1)* %in
%bc = bitcast <2 x i32> %ld to i64		%bc = bitcast <2 x i32> %ld to i64
%trunc = trunc i64 %bc to i32		%trunc = trunc i64 %bc to i32
Show All 31 Lines	define void @trunc_i16_bitcast_v2i16(i16 addrspace(1)* %out, <2 x i16> addrspace(1)* %in) {
%ld = load <2 x i16>, <2 x i16> addrspace(1)* %in		%ld = load <2 x i16>, <2 x i16> addrspace(1)* %in
%bc = bitcast <2 x i16> %ld to i32		%bc = bitcast <2 x i16> %ld to i32
%trunc = trunc i32 %bc to i16		%trunc = trunc i32 %bc to i16
store i16 %trunc, i16 addrspace(1)* %out		store i16 %trunc, i16 addrspace(1)* %out
ret void		ret void
}		}

; CHECK-LABEL: {{^}}trunc_i16_bitcast_v4i16:		; CHECK-LABEL: {{^}}trunc_i16_bitcast_v4i16:
; CHECK: buffer_load_dword [[VAL:v[0-9]+]]		; FIXME We need to teach the dagcombiner to reduce load width for:
		; t21: v2i32,ch = load<LD8[%in(addrspace=1)]> t12, t10, undef:i64
		; t23: i64 = bitcast t21
		; t30: i16 = truncate t23
		; SI: buffer_load_dword v[[VAL:[0-9]+]]
		; VI: buffer_load_dwordx2 v{{\[}}[[VAL:[0-9]+]]
; CHECK: buffer_store_short [[VAL]]		; CHECK: buffer_store_short [[VAL]]
define void @trunc_i16_bitcast_v4i16(i16 addrspace(1)* %out, <4 x i16> addrspace(1)* %in) {		define void @trunc_i16_bitcast_v4i16(i16 addrspace(1)* %out, <4 x i16> addrspace(1)* %in) {
%ld = load <4 x i16>, <4 x i16> addrspace(1)* %in		%ld = load <4 x i16>, <4 x i16> addrspace(1)* %in
%bc = bitcast <4 x i16> %ld to i64		%bc = bitcast <4 x i16> %ld to i64
%trunc = trunc i64 %bc to i16		%trunc = trunc i64 %bc to i16
store i16 %trunc, i16 addrspace(1)* %out		store i16 %trunc, i16 addrspace(1)* %out
ret void		ret void
}		}
Show All 34 Lines

test/CodeGen/AMDGPU/trunc-store-i1.ll

	Show All 15 Lines
	; SI-LABEL: {{^}}global_truncstore_i64_to_i1:			; SI-LABEL: {{^}}global_truncstore_i64_to_i1:
	; SI: buffer_store_byte			; SI: buffer_store_byte
	define void @global_truncstore_i64_to_i1(i1 addrspace(1)* %out, i64 %val) nounwind {			define void @global_truncstore_i64_to_i1(i1 addrspace(1)* %out, i64 %val) nounwind {
	%trunc = trunc i64 %val to i1			%trunc = trunc i64 %val to i1
	store i1 %trunc, i1 addrspace(1)* %out, align 1			store i1 %trunc, i1 addrspace(1)* %out, align 1
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}global_truncstore_i16_to_i1:			; SI-LABEL: {{^}}s_arg_global_truncstore_i16_to_i1:
	; SI: s_load_dword [[LOAD:s[0-9]+]],			; SI: s_load_dword [[LOAD:s[0-9]+]],
	; SI: s_and_b32 [[SREG:s[0-9]+]], [[LOAD]], 1			; SI: s_and_b32 [[SREG:s[0-9]+]], [[LOAD]], 1
	; SI: v_mov_b32_e32 [[VREG:v[0-9]+]], [[SREG]]			; SI: v_mov_b32_e32 [[VREG:v[0-9]+]], [[SREG]]
	; SI: buffer_store_byte [[VREG]],			; SI: buffer_store_byte [[VREG]],
	define void @global_truncstore_i16_to_i1(i1 addrspace(1)* %out, i16 %val) nounwind {			define void @s_arg_global_truncstore_i16_to_i1(i1 addrspace(1)* %out, i16 %val) nounwind {
	%trunc = trunc i16 %val to i1			%trunc = trunc i16 %val to i1
	store i1 %trunc, i1 addrspace(1)* %out, align 1			store i1 %trunc, i1 addrspace(1)* %out, align 1
	ret void			ret void
	}			}
				; SI-LABEL: {{^}}global_truncstore_i16_to_i1:
				define void @global_truncstore_i16_to_i1(i1 addrspace(1)* %out, i16 %val0, i16 %val1) nounwind {
				%add = add i16 %val0, %val1
				%trunc = trunc i16 %add to i1
				store i1 %trunc, i1 addrspace(1)* %out, align 1
				ret void
				}

test/CodeGen/AMDGPU/zero_extend.ll

	; RUN: llc < %s -march=amdgcn -verify-machineinstrs \| FileCheck %s --check-prefix=SI			; RUN: llc < %s -march=amdgcn -verify-machineinstrs \| FileCheck %s --check-prefix=SI
	; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s --check-prefix=SI			; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s --check-prefix=SI
	; RUN: llc < %s -march=r600 -mcpu=redwood \| FileCheck %s --check-prefix=R600			; RUN: llc < %s -march=r600 -mcpu=redwood \| FileCheck %s --check-prefix=R600

	; R600: {{^}}test:			; R600: {{^}}s_mad_zext_i32_to_i64:
	; R600: MEM_RAT_CACHELESS STORE_RAW			; R600: MEM_RAT_CACHELESS STORE_RAW
	; R600: MEM_RAT_CACHELESS STORE_RAW			; R600: MEM_RAT_CACHELESS STORE_RAW

	; SI: {{^}}test:			; SI: {{^}}s_mad_zext_i32_to_i64:
	; SI: v_mov_b32_e32 v[[V_ZERO:[0-9]]], 0{{$}}			; SI: v_mov_b32_e32 v[[V_ZERO:[0-9]]], 0{{$}}
	; SI: buffer_store_dwordx2 v[0:[[V_ZERO]]{{\]}}			; SI: buffer_store_dwordx2 v[0:[[V_ZERO]]{{\]}}
	define void @test(i64 addrspace(1)* %out, i32 %a, i32 %b, i32 %c) {			define void @s_mad_zext_i32_to_i64(i64 addrspace(1)* %out, i32 %a, i32 %b, i32 %c) #0 {
	entry:			entry:
	%0 = mul i32 %a, %b			%tmp0 = mul i32 %a, %b
	%1 = add i32 %0, %c			%tmp1 = add i32 %tmp0, %c
	%2 = zext i32 %1 to i64			%tmp2 = zext i32 %tmp1 to i64
	store i64 %2, i64 addrspace(1)* %out			store i64 %tmp2, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}testi1toi32:			; SI-LABEL: {{^}}s_cmp_zext_i1_to_i32
	; SI: v_cndmask_b32			; SI: v_cndmask_b32
	define void @testi1toi32(i32 addrspace(1)* %out, i32 %a, i32 %b) {			define void @s_cmp_zext_i1_to_i32(i32 addrspace(1)* %out, i32 %a, i32 %b) #0 {
	entry:			entry:
	%0 = icmp eq i32 %a, %b			%tmp0 = icmp eq i32 %a, %b
	%1 = zext i1 %0 to i32			%tmp1 = zext i1 %tmp0 to i32
	store i32 %1, i32 addrspace(1)* %out			store i32 %tmp1, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}zext_i1_to_i64:			; SI-LABEL: {{^}}s_arg_zext_i1_to_i64:
				define void @s_arg_zext_i1_to_i64(i64 addrspace(1)* %out, i1 zeroext %arg) #0 {
				%ext = zext i1 %arg to i64
				store i64 %ext, i64 addrspace(1)* %out, align 8
				ret void
				}

				; SI-LABEL: {{^}}s_cmp_zext_i1_to_i64:
	; SI: s_mov_b32 s{{[0-9]+}}, 0			; SI: s_mov_b32 s{{[0-9]+}}, 0
	; SI: v_cmp_eq_u32			; SI: v_cmp_eq_u32
	; SI: v_cndmask_b32			; SI: v_cndmask_b32
	define void @zext_i1_to_i64(i64 addrspace(1)* %out, i32 %a, i32 %b) nounwind {			define void @s_cmp_zext_i1_to_i64(i64 addrspace(1)* %out, i32 %a, i32 %b) #0 {
	%cmp = icmp eq i32 %a, %b			%cmp = icmp eq i32 %a, %b
	%ext = zext i1 %cmp to i64			%ext = zext i1 %cmp to i64
	store i64 %ext, i64 addrspace(1)* %out, align 8			store i64 %ext, i64 addrspace(1)* %out, align 8
	ret void			ret void
	}			}

				; SI-LABEL: {{^}}s_cmp_zext_i1_to_i16
				; SI: v_cndmask_b32_e64 [[RESULT:v[0-9]+]], 0, 1, vcc
				; SI: buffer_store_short [[RESULT]]
				define void @s_cmp_zext_i1_to_i16(i16 addrspace(1)* %out, i16 zeroext %a, i16 zeroext %b) #0 {
				%tmp0 = icmp eq i16 %a, %b
				%tmp1 = zext i1 %tmp0 to i16
				store i16 %tmp1, i16 addrspace(1)* %out
				ret void
				}

				attributes #0 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Make i16 a legal type for VI subtargetsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 76087

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstructions.td

lib/Target/AMDGPU/BUFInstructions.td

lib/Target/AMDGPU/DSInstructions.td

lib/Target/AMDGPU/FLATInstructions.td

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInstrInfo.td

lib/Target/AMDGPU/SIInstructions.td

lib/Target/AMDGPU/SIRegisterInfo.td

lib/Target/AMDGPU/SOPInstructions.td

lib/Target/AMDGPU/VIInstructions.td

lib/Target/AMDGPU/VOP1Instructions.td

lib/Target/AMDGPU/VOP2Instructions.td

lib/Target/AMDGPU/VOP3Instructions.td

test/CodeGen/AMDGPU/add.i16.ll

test/CodeGen/AMDGPU/anyext.ll

test/CodeGen/AMDGPU/bitreverse.ll

test/CodeGen/AMDGPU/cgp-bitfield-extract.ll

test/CodeGen/AMDGPU/copy-illegal-type.ll

test/CodeGen/AMDGPU/ctlz.ll

test/CodeGen/AMDGPU/ctlz_zero_undef.ll

test/CodeGen/AMDGPU/cube.ll

test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

test/CodeGen/AMDGPU/global-extload-i16.ll

test/CodeGen/AMDGPU/half.ll

test/CodeGen/AMDGPU/llvm.AMDGPU.bfe.u32.ll

test/CodeGen/AMDGPU/load-constant-i16.ll

test/CodeGen/AMDGPU/load-global-i16.ll

test/CodeGen/AMDGPU/load-global-i8.ll

test/CodeGen/AMDGPU/load-local-i16.ll

test/CodeGen/AMDGPU/load-local-i8.ll

test/CodeGen/AMDGPU/mad_uint24.ll

test/CodeGen/AMDGPU/max.i16.ll

test/CodeGen/AMDGPU/mul_uint24-amdgcn.ll

test/CodeGen/AMDGPU/shl.ll

test/CodeGen/AMDGPU/sign_extend.ll

test/CodeGen/AMDGPU/sra.ll

test/CodeGen/AMDGPU/sub.ll

test/CodeGen/AMDGPU/trunc-bitcast-vector.ll

test/CodeGen/AMDGPU/trunc-store-i1.ll

test/CodeGen/AMDGPU/zero_extend.ll

AMDGPU/SI: Make i16 a legal type for VI subtargets
ClosedPublic