This is an archive of the discontinued LLVM Phabricator instance.

Differential D52019

[AMDGPU] Divergence driven instruction selection. Part 1.
ClosedPublic

Authored by alex-t on Sep 13 2018, 1:54 AM.

Download Raw Diff

Details

Reviewers

rampitec
dp

Summary

This change is the first part of the AMDGPU target description change. The aim of it is the effective splitting the vector and scalar flows at the selection stage. Selection uses predicate functions based on the framework implemented earlier: https://reviews.llvm.org/D35267

Tests: CodeGen/AMDGPU on Win32

make check-llvm on lnx

Diff Detail

Event Timeline

alex-t created this revision.Sep 13 2018, 1:54 AM

Herald added subscribers: t-tye, tpr, dstuttard and 6 others. · View Herald TranscriptSep 13 2018, 1:54 AM

alex-t set the repository for this revision to rL LLVM.Sep 13 2018, 5:54 AM

alex-t added a subscriber: llvm-commits.

rampitec added inline comments.Sep 13 2018, 10:38 AM

lib/Target/AMDGPU/SIInstrInfo.td
1767	You can create enum for the mode in SIInstrInfo.td. Look at SIEncodingFamily as an example. Then I might be missing something, but I do not see anywhere a value 1 used for mode.
lib/Target/AMDGPU/VOP2Instructions.td
365	Is there a reason to move it from its original place? This seems to grow the patch unnecessary and makes review harder.
402	It would be nice to swap these instructions' blocks back to minimize the diff. Order change is not needed for this patch.
496	Again, yhou probably do not need to move adde and sube patterns.
509	Why is this reg_sequence and extract_subregs are needed? Why not just use i64 operands?
lib/Target/AMDGPU/VOPInstructions.td
576	Where do you set NeedPatGen = 1?
test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
450	It is the same, isn't it? Why not to keep it GCN-DAG?
478	Same here.

Source cleanup.

lib/Target/AMDGPU/VOP2Instructions.td
365	In fact the reason was that Tablegen gen-asm-matcher fails with the empty prefix strings in a def like this the empty string "" overrides the default VOP2_Pseido "prefix" argument that is prefix = "_e32" VOP2_Pseudo <"v_madmk_f32", VOP_MADMK_F32, [], ""> As a result asm-matcher-gen complained about mnemonic alias with the same string. For some miracle reason this expressed only with Predicates[] explicitly set. I don't know how it is related. Some another one Tablegen black magic. I moved this 4 defs back but removed empty prefix "". Since nothing failed after this I conclude that the empty prefix string was a covered bug.
509	How can I use i64 operands with V_AND_B32 that operates i32?
lib/Target/AMDGPU/VOPInstructions.td
576	Окау. All the stuff related this flag is going to be used later on when I add the patterns for extended encoding forms. I agree that it should be removed from this patch.

rampitec added a reviewer: dp.Sep 19 2018, 2:55 PM

rampitec added inline comments.

lib/Target/AMDGPU/VOP2Instructions.td
365	I think it was supposed to suppress printing of _e32 suffix in a command like: llvm-mc -arch=amdgcn -mcpu=fiji <<< 'v_madmk_f32 v0, v0, 1.0, v0' Although we have tests for it in MC/Disassembler/AMDGPU/, so as long as they pass it should be fine.
509	OK, thanks.
test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll
19–20	This must be a leftover from some experiments, there is no such check.
test/CodeGen/AMDGPU/shift-i128.ll
24	Was it done with update_llc_test_checks.py? It is strange to see indents changed. I suspect the file was really changed manually instead.

MC/Disassembler/AMDGPU passed
Tests fixed

alex-t marked 2 inline comments as done.Sep 20 2018, 4:26 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Sep 20 2018, 10:15 AM

Committed r342719.

dp accepted this revision.Sep 21 2018, 6:37 AM

alex-t closed this revision.Sep 26 2018, 9:28 AM

Hi there,

This change introduces a regression with RADV and Doom 2016, see https://bugs.freedesktop.org/show_bug.cgi?id=110636
We just discovered this, so LLVM 8 and master are affected. :(
I'm trying to find the root cause, I will let you know.

In D52019#1496145, @hakzsam wrote:

Hi there,

This change introduces a regression with RADV and Doom 2016, see https://bugs.freedesktop.org/show_bug.cgi?id=110636
We just discovered this, so LLVM 8 and master are affected. :(
I'm trying to find the root cause, I will let you know.

Did r360293 fix it?

In D52019#1496159, @arsenm wrote:

In D52019#1496145, @hakzsam wrote:

Hi there,

This change introduces a regression with RADV and Doom 2016, see https://bugs.freedesktop.org/show_bug.cgi?id=110636
We just discovered this, so LLVM 8 and master are affected. :(
I'm trying to find the root cause, I will let you know.

Did r360293 fix it?

Yes, it did fix the problem.
Do you plan to backport to LLVM 8?

In D52019#1496159, @arsenm wrote:

In D52019#1496145, @hakzsam wrote:

Hi there,

This change introduces a regression with RADV and Doom 2016, see https://bugs.freedesktop.org/show_bug.cgi?id=110636
We just discovered this, so LLVM 8 and master are affected. :(
I'm trying to find the root cause, I will let you know.

Did r360293 fix it?

In D52019#1496188, @hakzsam wrote:

In D52019#1496159, @arsenm wrote:

In D52019#1496145, @hakzsam wrote:

Hi there,

This change introduces a regression with RADV and Doom 2016, see https://bugs.freedesktop.org/show_bug.cgi?id=110636
We just discovered this, so LLVM 8 and master are affected. :(
I'm trying to find the root cause, I will let you know.

Did r360293 fix it?

Yes, it did fix the problem.
Do you plan to backport to LLVM 8?

https://bugs.llvm.org/show_bug.cgi?id=41811

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

4 lines

10 lines

8 lines

45 lines

102 lines

44 lines

test/

CodeGen/

AMDGPU/

add.ll

2 lines

amdgcn.private-memory.ll

2 lines

14 lines

2 lines

4 lines

9 lines

insert_vector_elt.v2i16.ll

12 lines

llvm.amdgcn.update.dpp.ll

3 lines

97 lines

2 lines

12 lines

2 lines

Diff 166260

lib/Target/AMDGPU/AMDGPU.td

Show First 20 Lines • Show All 686 Lines • ▼ Show 20 Lines	def D16PreservesUnusedBits : Predicate<"Subtarget->d16PreservesUnusedBits()">,
AssemblerPredicate<"FeatureD16PreservesUnusedBits">;		AssemblerPredicate<"FeatureD16PreservesUnusedBits">;

def LDSRequiresM0Init : Predicate<"Subtarget->ldsRequiresM0Init()">;		def LDSRequiresM0Init : Predicate<"Subtarget->ldsRequiresM0Init()">;
def NotLDSRequiresM0Init : Predicate<"!Subtarget->ldsRequiresM0Init()">;		def NotLDSRequiresM0Init : Predicate<"!Subtarget->ldsRequiresM0Init()">;

def HasDSAddTid : Predicate<"Subtarget->getGeneration() >= AMDGPUSubtarget::GFX9">,		def HasDSAddTid : Predicate<"Subtarget->getGeneration() >= AMDGPUSubtarget::GFX9">,
AssemblerPredicate<"FeatureGFX9Insts">;		AssemblerPredicate<"FeatureGFX9Insts">;

def HasAddNoCarryInsts : Predicate<"Subtarget->hasAddNoCarryInsts()">,		def HasAddNoCarryInsts : Predicate<"Subtarget->hasAddNoCarry()">,
AssemblerPredicate<"FeatureAddNoCarryInsts">;		AssemblerPredicate<"FeatureAddNoCarryInsts">;

def NotHasAddNoCarryInsts : Predicate<"!Subtarget->hasAddNoCarryInsts()">,		def NotHasAddNoCarryInsts : Predicate<"!Subtarget->hasAddNoCarry()">,
AssemblerPredicate<"!FeatureAddNoCarryInsts">;		AssemblerPredicate<"!FeatureAddNoCarryInsts">;

def Has16BitInsts : Predicate<"Subtarget->has16BitInsts()">,		def Has16BitInsts : Predicate<"Subtarget->has16BitInsts()">,
AssemblerPredicate<"Feature16BitInsts">;		AssemblerPredicate<"Feature16BitInsts">;
def HasVOP3PInsts : Predicate<"Subtarget->hasVOP3PInsts()">,		def HasVOP3PInsts : Predicate<"Subtarget->hasVOP3PInsts()">,
AssemblerPredicate<"FeatureVOP3P">;		AssemblerPredicate<"FeatureVOP3P">;

def NotHasVOP3PInsts : Predicate<"!Subtarget->hasVOP3PInsts()">,		def NotHasVOP3PInsts : Predicate<"!Subtarget->hasVOP3PInsts()">,
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.td

Show First 20 Lines • Show All 1,633 Lines • ▼ Show 20 Lines
class BitOr<bit a, bit b> {		class BitOr<bit a, bit b> {
bit ret = !if(a, 1, !if(b, 1, 0));		bit ret = !if(a, 1, !if(b, 1, 0));
}		}

class BitAnd<bit a, bit b> {		class BitAnd<bit a, bit b> {
bit ret = !if(a, !if(b, 1, 0), 0);		bit ret = !if(a, !if(b, 1, 0), 0);
}		}

		def PatGenMode {
		int NoPattern = 0;
		int Pattern = 1;
		}

class VOPProfile <list<ValueType> _ArgVT> {		class VOPProfile <list<ValueType> _ArgVT> {

field list<ValueType> ArgVT = _ArgVT;		field list<ValueType> ArgVT = _ArgVT;

field ValueType DstVT = ArgVT[0];		field ValueType DstVT = ArgVT[0];
field ValueType Src0VT = ArgVT[1];		field ValueType Src0VT = ArgVT[1];
field ValueType Src1VT = ArgVT[2];		field ValueType Src1VT = ArgVT[2];
field ValueType Src2VT = ArgVT[3];		field ValueType Src2VT = ArgVT[3];
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	class VOPProfile <list<ValueType> _ArgVT> {

field bit IsPacked = isPackedType<Src0VT>.ret;		field bit IsPacked = isPackedType<Src0VT>.ret;
field bit HasOpSel = IsPacked;		field bit HasOpSel = IsPacked;
field bit HasOMod = !if(HasOpSel, 0, isFloatType<DstVT>.ret);		field bit HasOMod = !if(HasOpSel, 0, isFloatType<DstVT>.ret);
field bit HasSDWAOMod = isFloatType<DstVT>.ret;		field bit HasSDWAOMod = isFloatType<DstVT>.ret;

field bit HasExt = getHasExt<NumSrcArgs, DstVT, Src0VT, Src1VT>.ret;		field bit HasExt = getHasExt<NumSrcArgs, DstVT, Src0VT, Src1VT>.ret;
field bit HasSDWA9 = HasExt;		field bit HasSDWA9 = HasExt;
		field int NeedPatGen = PatGenMode.NoPattern;

field Operand Src0PackedMod = !if(HasSrc0FloatMods, PackedF16InputMods, PackedI16InputMods);		field Operand Src0PackedMod = !if(HasSrc0FloatMods, PackedF16InputMods, PackedI16InputMods);
field Operand Src1PackedMod = !if(HasSrc1FloatMods, PackedF16InputMods, PackedI16InputMods);		field Operand Src1PackedMod = !if(HasSrc1FloatMods, PackedF16InputMods, PackedI16InputMods);
field Operand Src2PackedMod = !if(HasSrc2FloatMods, PackedF16InputMods, PackedI16InputMods);		field Operand Src2PackedMod = !if(HasSrc2FloatMods, PackedF16InputMods, PackedI16InputMods);

field dag Outs = !if(HasDst,(outs DstRC:$vdst),(outs));		field dag Outs = !if(HasDst,(outs DstRC:$vdst),(outs));

// VOP3b instructions are a special case with a second explicit		// VOP3b instructions are a special case with a second explicit
Show All 36 Lines	class VOPProfile <list<ValueType> _ArgVT> {
field string AsmSDWA9 = getAsmSDWA9<HasDst, HasSDWAOMod, NumSrcArgs, DstVT>.ret;		field string AsmSDWA9 = getAsmSDWA9<HasDst, HasSDWAOMod, NumSrcArgs, DstVT>.ret;
}		}

class VOP_NO_EXT <VOPProfile p> : VOPProfile <p.ArgVT> {		class VOP_NO_EXT <VOPProfile p> : VOPProfile <p.ArgVT> {
let HasExt = 0;		let HasExt = 0;
let HasSDWA9 = 0;		let HasSDWA9 = 0;
}		}

		class VOP_PAT_GEN <VOPProfile p, int mode=PatGenMode.Pattern> : VOPProfile <p.ArgVT> {
		rampitecUnsubmitted Done Reply Inline Actions You can create enum for the mode in SIInstrInfo.td. Look at SIEncodingFamily as an example. Then I might be missing something, but I do not see anywhere a value 1 used for mode. rampitec: You can create enum for the mode in SIInstrInfo.td. Look at SIEncodingFamily as an example.
		let NeedPatGen = mode;
		}

def VOP_F16_F16 : VOPProfile <[f16, f16, untyped, untyped]>;		def VOP_F16_F16 : VOPProfile <[f16, f16, untyped, untyped]>;
def VOP_F16_I16 : VOPProfile <[f16, i16, untyped, untyped]>;		def VOP_F16_I16 : VOPProfile <[f16, i16, untyped, untyped]>;
def VOP_I16_F16 : VOPProfile <[i16, f16, untyped, untyped]>;		def VOP_I16_F16 : VOPProfile <[i16, f16, untyped, untyped]>;

def VOP_F16_F16_F16 : VOPProfile <[f16, f16, f16, untyped]>;		def VOP_F16_F16_F16 : VOPProfile <[f16, f16, f16, untyped]>;
def VOP_F16_F16_I16 : VOPProfile <[f16, f16, i16, untyped]>;		def VOP_F16_F16_I16 : VOPProfile <[f16, f16, i16, untyped]>;
def VOP_F16_F16_I32 : VOPProfile <[f16, f16, i32, untyped]>;		def VOP_F16_F16_I32 : VOPProfile <[f16, f16, i32, untyped]>;
def VOP_I16_I16_I16 : VOPProfile <[i16, i16, i16, untyped]>;		def VOP_I16_I16_I16 : VOPProfile <[i16, i16, i16, untyped]>;
▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstructions.td

Show All 9 Lines
// all the instruction definitions were originally commented out. Instructions		// all the instruction definitions were originally commented out. Instructions
// that are not yet supported remain commented out.		// that are not yet supported remain commented out.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class GCNPat<dag pattern, dag result> : Pat<pattern, result>, GCNPredicateControl {		class GCNPat<dag pattern, dag result> : Pat<pattern, result>, GCNPredicateControl {
let SubtargetPredicate = isGCN;		let SubtargetPredicate = isGCN;
}		}

include "VOPInstructions.td"
include "SOPInstructions.td"		include "SOPInstructions.td"
		include "VOPInstructions.td"
include "SMInstructions.td"		include "SMInstructions.td"
include "FLATInstructions.td"		include "FLATInstructions.td"
include "BUFInstructions.td"		include "BUFInstructions.td"

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// EXP Instructions		// EXP Instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 694 Lines • ▼ Show 20 Lines	multiclass SelectPat <ValueType vt, Instruction inst> {
>;		>;
}		}

defm : SelectPat <i16, V_CNDMASK_B32_e64>;		defm : SelectPat <i16, V_CNDMASK_B32_e64>;
defm : SelectPat <i32, V_CNDMASK_B32_e64>;		defm : SelectPat <i32, V_CNDMASK_B32_e64>;
defm : SelectPat <f16, V_CNDMASK_B32_e64>;		defm : SelectPat <f16, V_CNDMASK_B32_e64>;
defm : SelectPat <f32, V_CNDMASK_B32_e64>;		defm : SelectPat <f32, V_CNDMASK_B32_e64>;

		let AddedComplexity = 1 in {
def : GCNPat <		def : GCNPat <
(i32 (add (i32 (ctpop i32:$popcnt)), i32:$val)),		(i32 (add (i32 (getDivergentFrag<ctpop>.ret i32:$popcnt)), i32:$val)),
(V_BCNT_U32_B32_e64 $popcnt, $val)		(V_BCNT_U32_B32_e64 $popcnt, $val)
>;		>;
		}
def : GCNPat <		def : GCNPat <
(i16 (add (i16 (trunc (ctpop i32:$popcnt))), i16:$val)),		(i16 (add (i16 (trunc (getDivergentFrag<ctpop>.ret i32:$popcnt))), i16:$val)),
(V_BCNT_U32_B32_e64 $popcnt, $val)		(V_BCNT_U32_B32_e64 $popcnt, $val)
>;		>;

/******** ============================================ ********/		/******** ============================================ ********/
/******** Extraction, Insertion, Building and Casting ********/		/******** Extraction, Insertion, Building and Casting ********/
/******** ============================================ ********/		/******** ============================================ ********/

foreach Index = 0-2 in {		foreach Index = 0-2 in {
▲ Show 20 Lines • Show All 906 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SOPInstructions.td

Show First 20 Lines • Show All 330 Lines • ▼ Show 20 Lines	class SOP2_64_32 <string opName, list<dag> pattern=[]> : SOP2_Pseudo <
"$sdst, $src0, $src1", pattern		"$sdst, $src0, $src1", pattern
>;		>;

class SOP2_64_32_32 <string opName, list<dag> pattern=[]> : SOP2_Pseudo <		class SOP2_64_32_32 <string opName, list<dag> pattern=[]> : SOP2_Pseudo <
opName, (outs SReg_64:$sdst), (ins SSrc_b32:$src0, SSrc_b32:$src1),		opName, (outs SReg_64:$sdst), (ins SSrc_b32:$src0, SSrc_b32:$src1),
"$sdst, $src0, $src1", pattern		"$sdst, $src0, $src1", pattern
>;		>;

		class UniformBinFrag<SDPatternOperator Op> : PatFrag <
		(ops node:$src0, node:$src1),
		(Op $src0, $src1),
		[{ return !N->isDivergent(); }]
		>;

let Defs = [SCC] in { // Carry out goes to SCC		let Defs = [SCC] in { // Carry out goes to SCC
let isCommutable = 1 in {		let isCommutable = 1 in {
def S_ADD_U32 : SOP2_32 <"s_add_u32">;		def S_ADD_U32 : SOP2_32 <"s_add_u32">;
def S_ADD_I32 : SOP2_32 <"s_add_i32",		def S_ADD_I32 : SOP2_32 <"s_add_i32",
[(set i32:$sdst, (add SSrc_b32:$src0, SSrc_b32:$src1))]		[(set i32:$sdst, (UniformBinFrag<add> SSrc_b32:$src0, SSrc_b32:$src1))]
>;		>;
} // End isCommutable = 1		} // End isCommutable = 1

def S_SUB_U32 : SOP2_32 <"s_sub_u32">;		def S_SUB_U32 : SOP2_32 <"s_sub_u32">;
def S_SUB_I32 : SOP2_32 <"s_sub_i32",		def S_SUB_I32 : SOP2_32 <"s_sub_i32",
[(set i32:$sdst, (sub SSrc_b32:$src0, SSrc_b32:$src1))]		[(set i32:$sdst, (UniformBinFrag<sub> SSrc_b32:$src0, SSrc_b32:$src1))]
>;		>;

let Uses = [SCC] in { // Carry in comes from SCC		let Uses = [SCC] in { // Carry in comes from SCC
let isCommutable = 1 in {		let isCommutable = 1 in {
def S_ADDC_U32 : SOP2_32 <"s_addc_u32",		def S_ADDC_U32 : SOP2_32 <"s_addc_u32",
[(set i32:$sdst, (adde (i32 SSrc_b32:$src0), (i32 SSrc_b32:$src1)))]>;		[(set i32:$sdst, (UniformBinFrag<adde> (i32 SSrc_b32:$src0), (i32 SSrc_b32:$src1)))]>;
} // End isCommutable = 1		} // End isCommutable = 1

def S_SUBB_U32 : SOP2_32 <"s_subb_u32",		def S_SUBB_U32 : SOP2_32 <"s_subb_u32",
[(set i32:$sdst, (sube (i32 SSrc_b32:$src0), (i32 SSrc_b32:$src1)))]>;		[(set i32:$sdst, (UniformBinFrag<sube> (i32 SSrc_b32:$src0), (i32 SSrc_b32:$src1)))]>;
} // End Uses = [SCC]		} // End Uses = [SCC]


let isCommutable = 1 in {		let isCommutable = 1 in {
def S_MIN_I32 : SOP2_32 <"s_min_i32",		def S_MIN_I32 : SOP2_32 <"s_min_i32",
[(set i32:$sdst, (smin i32:$src0, i32:$src1))]		[(set i32:$sdst, (UniformBinFrag<smin> i32:$src0, i32:$src1))]
>;		>;
def S_MIN_U32 : SOP2_32 <"s_min_u32",		def S_MIN_U32 : SOP2_32 <"s_min_u32",
[(set i32:$sdst, (umin i32:$src0, i32:$src1))]		[(set i32:$sdst, (UniformBinFrag<umin> i32:$src0, i32:$src1))]
>;		>;
def S_MAX_I32 : SOP2_32 <"s_max_i32",		def S_MAX_I32 : SOP2_32 <"s_max_i32",
[(set i32:$sdst, (smax i32:$src0, i32:$src1))]		[(set i32:$sdst, (UniformBinFrag<smax> i32:$src0, i32:$src1))]
>;		>;
def S_MAX_U32 : SOP2_32 <"s_max_u32",		def S_MAX_U32 : SOP2_32 <"s_max_u32",
[(set i32:$sdst, (umax i32:$src0, i32:$src1))]		[(set i32:$sdst, (UniformBinFrag<umax> i32:$src0, i32:$src1))]
>;		>;
} // End isCommutable = 1		} // End isCommutable = 1
} // End Defs = [SCC]		} // End Defs = [SCC]


let Uses = [SCC] in {		let Uses = [SCC] in {
def S_CSELECT_B32 : SOP2_32 <"s_cselect_b32">;		def S_CSELECT_B32 : SOP2_32 <"s_cselect_b32">;
def S_CSELECT_B64 : SOP2_64 <"s_cselect_b64">;		def S_CSELECT_B64 : SOP2_64 <"s_cselect_b64">;
} // End Uses = [SCC]		} // End Uses = [SCC]

let Defs = [SCC] in {		let Defs = [SCC] in {
let isCommutable = 1 in {		let isCommutable = 1 in {
def S_AND_B32 : SOP2_32 <"s_and_b32",		def S_AND_B32 : SOP2_32 <"s_and_b32",
[(set i32:$sdst, (and i32:$src0, i32:$src1))]		[(set i32:$sdst, (UniformBinFrag<and> i32:$src0, i32:$src1))]
>;		>;

def S_AND_B64 : SOP2_64 <"s_and_b64",		def S_AND_B64 : SOP2_64 <"s_and_b64",
[(set i64:$sdst, (and i64:$src0, i64:$src1))]		[(set i64:$sdst, (UniformBinFrag<and> i64:$src0, i64:$src1))]
>;		>;

def S_OR_B32 : SOP2_32 <"s_or_b32",		def S_OR_B32 : SOP2_32 <"s_or_b32",
[(set i32:$sdst, (or i32:$src0, i32:$src1))]		[(set i32:$sdst, (UniformBinFrag<or> i32:$src0, i32:$src1))]
>;		>;

def S_OR_B64 : SOP2_64 <"s_or_b64",		def S_OR_B64 : SOP2_64 <"s_or_b64",
[(set i64:$sdst, (or i64:$src0, i64:$src1))]		[(set i64:$sdst, (UniformBinFrag<or> i64:$src0, i64:$src1))]
>;		>;

def S_XOR_B32 : SOP2_32 <"s_xor_b32",		def S_XOR_B32 : SOP2_32 <"s_xor_b32",
[(set i32:$sdst, (xor i32:$src0, i32:$src1))]		[(set i32:$sdst, (UniformBinFrag<xor> i32:$src0, i32:$src1))]
>;		>;

def S_XOR_B64 : SOP2_64 <"s_xor_b64",		def S_XOR_B64 : SOP2_64 <"s_xor_b64",
[(set i64:$sdst, (xor i64:$src0, i64:$src1))]		[(set i64:$sdst, (UniformBinFrag<xor> i64:$src0, i64:$src1))]
>;		>;

def S_XNOR_B32 : SOP2_32 <"s_xnor_b32",		def S_XNOR_B32 : SOP2_32 <"s_xnor_b32",
[(set i32:$sdst, (not (xor_oneuse i32:$src0, i32:$src1)))]		[(set i32:$sdst, (not (xor_oneuse i32:$src0, i32:$src1)))]
>;		>;

def S_XNOR_B64 : SOP2_64 <"s_xnor_b64",		def S_XNOR_B64 : SOP2_64 <"s_xnor_b64",
[(set i64:$sdst, (not (xor_oneuse i64:$src0, i64:$src1)))]		[(set i64:$sdst, (not (xor_oneuse i64:$src0, i64:$src1)))]
Show All 9 Lines
def S_NOR_B32 : SOP2_32 <"s_nor_b32">;		def S_NOR_B32 : SOP2_32 <"s_nor_b32">;
def S_NOR_B64 : SOP2_64 <"s_nor_b64">;		def S_NOR_B64 : SOP2_64 <"s_nor_b64">;
} // End Defs = [SCC]		} // End Defs = [SCC]

// Use added complexity so these patterns are preferred to the VALU patterns.		// Use added complexity so these patterns are preferred to the VALU patterns.
let AddedComplexity = 1 in {		let AddedComplexity = 1 in {

let Defs = [SCC] in {		let Defs = [SCC] in {
		// TODO: b64 versions require VOP3 change since v_lshlrev_b64 is VOP3
def S_LSHL_B32 : SOP2_32 <"s_lshl_b32",		def S_LSHL_B32 : SOP2_32 <"s_lshl_b32",
[(set i32:$sdst, (shl i32:$src0, i32:$src1))]		[(set i32:$sdst, (UniformBinFrag<shl> i32:$src0, i32:$src1))]
>;		>;
def S_LSHL_B64 : SOP2_64_32 <"s_lshl_b64",		def S_LSHL_B64 : SOP2_64_32 <"s_lshl_b64",
[(set i64:$sdst, (shl i64:$src0, i32:$src1))]		[(set i64:$sdst, (shl i64:$src0, i32:$src1))]
>;		>;
def S_LSHR_B32 : SOP2_32 <"s_lshr_b32",		def S_LSHR_B32 : SOP2_32 <"s_lshr_b32",
[(set i32:$sdst, (srl i32:$src0, i32:$src1))]		[(set i32:$sdst, (UniformBinFrag<srl> i32:$src0, i32:$src1))]
>;		>;
def S_LSHR_B64 : SOP2_64_32 <"s_lshr_b64",		def S_LSHR_B64 : SOP2_64_32 <"s_lshr_b64",
[(set i64:$sdst, (srl i64:$src0, i32:$src1))]		[(set i64:$sdst, (srl i64:$src0, i32:$src1))]
>;		>;
def S_ASHR_I32 : SOP2_32 <"s_ashr_i32",		def S_ASHR_I32 : SOP2_32 <"s_ashr_i32",
[(set i32:$sdst, (sra i32:$src0, i32:$src1))]		[(set i32:$sdst, (UniformBinFrag<sra> i32:$src0, i32:$src1))]
>;		>;
def S_ASHR_I64 : SOP2_64_32 <"s_ashr_i64",		def S_ASHR_I64 : SOP2_64_32 <"s_ashr_i64",
[(set i64:$sdst, (sra i64:$src0, i32:$src1))]		[(set i64:$sdst, (sra i64:$src0, i32:$src1))]
>;		>;
} // End Defs = [SCC]		} // End Defs = [SCC]

def S_BFM_B32 : SOP2_32 <"s_bfm_b32",		def S_BFM_B32 : SOP2_32 <"s_bfm_b32",
[(set i32:$sdst, (AMDGPUbfm i32:$src0, i32:$src1))]>;		[(set i32:$sdst, (UniformBinFrag<AMDGPUbfm> i32:$src0, i32:$src1))]>;
def S_BFM_B64 : SOP2_64_32_32 <"s_bfm_b64">;		def S_BFM_B64 : SOP2_64_32_32 <"s_bfm_b64">;

		// TODO: S_MUL_I32 require V_MUL_LO_I32 from VOP3 change
def S_MUL_I32 : SOP2_32 <"s_mul_i32",		def S_MUL_I32 : SOP2_32 <"s_mul_i32",
[(set i32:$sdst, (mul i32:$src0, i32:$src1))]> {		[(set i32:$sdst, (mul i32:$src0, i32:$src1))]> {
let isCommutable = 1;		let isCommutable = 1;
}		}

} // End AddedComplexity = 1		} // End AddedComplexity = 1

let Defs = [SCC] in {		let Defs = [SCC] in {
▲ Show 20 Lines • Show All 913 Lines • Show Last 20 Lines

lib/Target/AMDGPU/VOP2Instructions.td

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
multiclass VOP2Inst <string opName,		multiclass VOP2Inst <string opName,
VOPProfile P,		VOPProfile P,
SDPatternOperator node = null_frag,		SDPatternOperator node = null_frag,
string revOp = opName,		string revOp = opName,
bit GFX9Renamed = 0> {		bit GFX9Renamed = 0> {

let renamedInGFX9 = GFX9Renamed in {		let renamedInGFX9 = GFX9Renamed in {

def _e32 : VOP2_Pseudo <opName, P>,		def _e32 : VOP2_Pseudo <opName, P, VOPPatOrNull<node,P>.ret>,
Commutable_REV<revOp#"_e32", !eq(revOp, opName)>;		Commutable_REV<revOp#"_e32", !eq(revOp, opName)>;

def _e64 : VOP3_Pseudo <opName, P, getVOP2Pat64<node, P>.ret>,		def _e64 : VOP3_Pseudo <opName, P, getVOP2Pat64<node, P>.ret>,
Commutable_REV<revOp#"_e64", !eq(revOp, opName)>;		Commutable_REV<revOp#"_e64", !eq(revOp, opName)>;

def _sdwa : VOP2_SDWA_Pseudo <opName, P>;		def _sdwa : VOP2_SDWA_Pseudo <opName, P>;

}		}
}		}

multiclass VOP2bInst <string opName,		multiclass VOP2bInst <string opName,
VOPProfile P,		VOPProfile P,
SDPatternOperator node = null_frag,		SDPatternOperator node = null_frag,
string revOp = opName,		string revOp = opName,
bit GFX9Renamed = 0,		bit GFX9Renamed = 0,
bit useSGPRInput = !eq(P.NumSrcArgs, 3)> {		bit useSGPRInput = !eq(P.NumSrcArgs, 3)> {
let renamedInGFX9 = GFX9Renamed in {		let renamedInGFX9 = GFX9Renamed in {
let SchedRW = [Write32Bit, WriteSALU] in {		let SchedRW = [Write32Bit, WriteSALU] in {
let Uses = !if(useSGPRInput, [VCC, EXEC], [EXEC]), Defs = [VCC] in {		let Uses = !if(useSGPRInput, [VCC, EXEC], [EXEC]), Defs = [VCC] in {
def _e32 : VOP2_Pseudo <opName, P>,		def _e32 : VOP2_Pseudo <opName, P, VOPPatOrNull<node,P>.ret>,
Commutable_REV<revOp#"_e32", !eq(revOp, opName)>;		Commutable_REV<revOp#"_e32", !eq(revOp, opName)>;

def _sdwa : VOP2_SDWA_Pseudo <opName, P> {		def _sdwa : VOP2_SDWA_Pseudo <opName, P> {
let AsmMatchConverter = "cvtSdwaVOP2b";		let AsmMatchConverter = "cvtSdwaVOP2b";
}		}
}		}

def _e64 : VOP3_Pseudo <opName, P, getVOP2Pat64<node, P>.ret>,		def _e64 : VOP3_Pseudo <opName, P, getVOP2Pat64<node, P>.ret>,
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	def VOP_WRITELANE : VOPProfile<[i32, i32, i32, i32]> {
let HasSrc2 = 0;		let HasSrc2 = 0;
let HasSrc2Mods = 0;		let HasSrc2Mods = 0;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// VOP2 Instructions		// VOP2 Instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let SubtargetPredicate = isGCN in {		let SubtargetPredicate = isGCN, Predicates = [isGCN] in {

defm V_CNDMASK_B32 : VOP2eInst <"v_cndmask_b32", VOP2e_I32_I32_I32_I1>;		defm V_CNDMASK_B32 : VOP2eInst <"v_cndmask_b32", VOP2e_I32_I32_I32_I1>;
def V_MADMK_F32 : VOP2_Pseudo <"v_madmk_f32", VOP_MADMK_F32, [], "">;		def V_MADMK_F32 : VOP2_Pseudo <"v_madmk_f32", VOP_MADMK_F32, []>;

let isCommutable = 1 in {		let isCommutable = 1 in {
defm V_ADD_F32 : VOP2Inst <"v_add_f32", VOP_F32_F32_F32, fadd>;		defm V_ADD_F32 : VOP2Inst <"v_add_f32", VOP_F32_F32_F32, fadd>;
defm V_SUB_F32 : VOP2Inst <"v_sub_f32", VOP_F32_F32_F32, fsub>;		defm V_SUB_F32 : VOP2Inst <"v_sub_f32", VOP_F32_F32_F32, fsub>;
defm V_SUBREV_F32 : VOP2Inst <"v_subrev_f32", VOP_F32_F32_F32, null_frag, "v_sub_f32">;		defm V_SUBREV_F32 : VOP2Inst <"v_subrev_f32", VOP_F32_F32_F32, null_frag, "v_sub_f32">;
defm V_MUL_LEGACY_F32 : VOP2Inst <"v_mul_legacy_f32", VOP_F32_F32_F32, AMDGPUfmul_legacy>;		defm V_MUL_LEGACY_F32 : VOP2Inst <"v_mul_legacy_f32", VOP_F32_F32_F32, AMDGPUfmul_legacy>;
defm V_MUL_F32 : VOP2Inst <"v_mul_f32", VOP_F32_F32_F32, fmul>;		defm V_MUL_F32 : VOP2Inst <"v_mul_f32", VOP_F32_F32_F32, fmul>;
		rampitecUnsubmitted Done Reply Inline Actions Is there a reason to move it from its original place? This seems to grow the patch unnecessary and makes review harder. rampitec: Is there a reason to move it from its original place? This seems to grow the patch unnecessary…
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions In fact the reason was that Tablegen gen-asm-matcher fails with the empty prefix strings in a def like this the empty string "" overrides the default VOP2_Pseido "prefix" argument that is prefix = "_e32" VOP2_Pseudo <"v_madmk_f32", VOP_MADMK_F32, [], ""> As a result asm-matcher-gen complained about mnemonic alias with the same string. For some miracle reason this expressed only with Predicates[] explicitly set. I don't know how it is related. Some another one Tablegen black magic. I moved this 4 defs back but removed empty prefix "". Since nothing failed after this I conclude that the empty prefix string was a covered bug. alex-t: In fact the reason was that Tablegen gen-asm-matcher fails with the empty prefix strings in a…
		rampitecUnsubmitted Not Done Reply Inline Actions I think it was supposed to suppress printing of _e32 suffix in a command like: llvm-mc -arch=amdgcn -mcpu=fiji <<< 'v_madmk_f32 v0, v0, 1.0, v0' Although we have tests for it in MC/Disassembler/AMDGPU/, so as long as they pass it should be fine. rampitec: I think it was supposed to suppress printing of _e32 suffix in a command like: ``` llvm-mc…
defm V_MUL_I32_I24 : VOP2Inst <"v_mul_i32_i24", VOP_I32_I32_I32, AMDGPUmul_i24>;		defm V_MUL_I32_I24 : VOP2Inst <"v_mul_i32_i24", VOP_PAT_GEN<VOP_I32_I32_I32, 2>, AMDGPUmul_i24>;
defm V_MUL_HI_I32_I24 : VOP2Inst <"v_mul_hi_i32_i24", VOP_I32_I32_I32, AMDGPUmulhi_i24>;		defm V_MUL_HI_I32_I24 : VOP2Inst <"v_mul_hi_i32_i24", VOP_PAT_GEN<VOP_I32_I32_I32, 2>, AMDGPUmulhi_i24>;
defm V_MUL_U32_U24 : VOP2Inst <"v_mul_u32_u24", VOP_I32_I32_I32, AMDGPUmul_u24>;		defm V_MUL_U32_U24 : VOP2Inst <"v_mul_u32_u24", VOP_PAT_GEN<VOP_I32_I32_I32, 2>, AMDGPUmul_u24>;
defm V_MUL_HI_U32_U24 : VOP2Inst <"v_mul_hi_u32_u24", VOP_I32_I32_I32, AMDGPUmulhi_u24>;		defm V_MUL_HI_U32_U24 : VOP2Inst <"v_mul_hi_u32_u24", VOP_PAT_GEN<VOP_I32_I32_I32, 2>, AMDGPUmulhi_u24>;
defm V_MIN_F32 : VOP2Inst <"v_min_f32", VOP_F32_F32_F32, fminnum>;		defm V_MIN_F32 : VOP2Inst <"v_min_f32", VOP_F32_F32_F32, fminnum>;
defm V_MAX_F32 : VOP2Inst <"v_max_f32", VOP_F32_F32_F32, fmaxnum>;		defm V_MAX_F32 : VOP2Inst <"v_max_f32", VOP_F32_F32_F32, fmaxnum>;
defm V_MIN_I32 : VOP2Inst <"v_min_i32", VOP_I32_I32_I32>;		defm V_MIN_I32 : VOP2Inst <"v_min_i32", VOP_PAT_GEN<VOP_I32_I32_I32>, smin>;
defm V_MAX_I32 : VOP2Inst <"v_max_i32", VOP_I32_I32_I32>;		defm V_MAX_I32 : VOP2Inst <"v_max_i32", VOP_PAT_GEN<VOP_I32_I32_I32>, smax>;
defm V_MIN_U32 : VOP2Inst <"v_min_u32", VOP_I32_I32_I32>;		defm V_MIN_U32 : VOP2Inst <"v_min_u32", VOP_PAT_GEN<VOP_I32_I32_I32>, umin>;
defm V_MAX_U32 : VOP2Inst <"v_max_u32", VOP_I32_I32_I32>;		defm V_MAX_U32 : VOP2Inst <"v_max_u32", VOP_PAT_GEN<VOP_I32_I32_I32>, umax>;
defm V_LSHRREV_B32 : VOP2Inst <"v_lshrrev_b32", VOP_I32_I32_I32, null_frag, "v_lshr_b32">;		defm V_LSHRREV_B32 : VOP2Inst <"v_lshrrev_b32", VOP_I32_I32_I32, null_frag, "v_lshr_b32">;
defm V_ASHRREV_I32 : VOP2Inst <"v_ashrrev_i32", VOP_I32_I32_I32, null_frag, "v_ashr_i32">;		defm V_ASHRREV_I32 : VOP2Inst <"v_ashrrev_i32", VOP_I32_I32_I32, null_frag, "v_ashr_i32">;
defm V_LSHLREV_B32 : VOP2Inst <"v_lshlrev_b32", VOP_I32_I32_I32, null_frag, "v_lshl_b32">;		defm V_LSHLREV_B32 : VOP2Inst <"v_lshlrev_b32", VOP_I32_I32_I32, null_frag, "v_lshl_b32">;
defm V_AND_B32 : VOP2Inst <"v_and_b32", VOP_I32_I32_I32>;		defm V_AND_B32 : VOP2Inst <"v_and_b32", VOP_PAT_GEN<VOP_I32_I32_I32>, and>;
defm V_OR_B32 : VOP2Inst <"v_or_b32", VOP_I32_I32_I32>;		defm V_OR_B32 : VOP2Inst <"v_or_b32", VOP_PAT_GEN<VOP_I32_I32_I32>, or>;
defm V_XOR_B32 : VOP2Inst <"v_xor_b32", VOP_I32_I32_I32>;		defm V_XOR_B32 : VOP2Inst <"v_xor_b32", VOP_PAT_GEN<VOP_I32_I32_I32>, xor>;

let Constraints = "$vdst = $src2", DisableEncoding="$src2",		let Constraints = "$vdst = $src2", DisableEncoding="$src2",
isConvertibleToThreeAddress = 1 in {		isConvertibleToThreeAddress = 1 in {
defm V_MAC_F32 : VOP2Inst <"v_mac_f32", VOP_MAC_F32>;		defm V_MAC_F32 : VOP2Inst <"v_mac_f32", VOP_MAC_F32>;
}		}

def V_MADAK_F32 : VOP2_Pseudo <"v_madak_f32", VOP_MADAK_F32, [], "">;		def V_MADAK_F32 : VOP2_Pseudo <"v_madak_f32", VOP_MADAK_F32, []>;

// No patterns so that the scalar instructions are always selected.		// No patterns so that the scalar instructions are always selected.
// The scalar versions will be replaced with vector when needed later.		// The scalar versions will be replaced with vector when needed later.

// V_ADD_I32, V_SUB_I32, and V_SUBREV_I32 where renamed to *_U32 in VI,		// V_ADD_I32, V_SUB_I32, and V_SUBREV_I32 where renamed to *_U32 in VI,
// but the VI instructions behave the same as the SI versions.		// but the VI instructions behave the same as the SI versions.
defm V_ADD_I32 : VOP2bInst <"v_add_i32", VOP2b_I32_I1_I32_I32, null_frag, "v_add_i32", 1>;		defm V_ADD_I32 : VOP2bInst <"v_add_i32", VOP2b_I32_I1_I32_I32, null_frag, "v_add_i32", 1>;
defm V_SUB_I32 : VOP2bInst <"v_sub_i32", VOP2b_I32_I1_I32_I32, null_frag, "v_sub_i32", 1>;		defm V_SUB_I32 : VOP2bInst <"v_sub_i32", VOP2b_I32_I1_I32_I32, null_frag, "v_sub_i32", 1>;
defm V_SUBREV_I32 : VOP2bInst <"v_subrev_i32", VOP2b_I32_I1_I32_I32, null_frag, "v_sub_i32", 1>;		defm V_SUBREV_I32 : VOP2bInst <"v_subrev_i32", VOP2b_I32_I1_I32_I32, null_frag, "v_sub_i32", 1>;
defm V_ADDC_U32 : VOP2bInst <"v_addc_u32", VOP2b_I32_I1_I32_I32_I1, null_frag, "v_addc_u32", 1>;		defm V_ADDC_U32 : VOP2bInst <"v_addc_u32", VOP2b_I32_I1_I32_I32_I1, null_frag, "v_addc_u32", 1>;
defm V_SUBB_U32 : VOP2bInst <"v_subb_u32", VOP2b_I32_I1_I32_I32_I1, null_frag, "v_subb_u32", 1>;		defm V_SUBB_U32 : VOP2bInst <"v_subb_u32", VOP2b_I32_I1_I32_I32_I1, null_frag, "v_subb_u32", 1>;
defm V_SUBBREV_U32 : VOP2bInst <"v_subbrev_u32", VOP2b_I32_I1_I32_I32_I1, null_frag, "v_subb_u32", 1>;		defm V_SUBBREV_U32 : VOP2bInst <"v_subbrev_u32", VOP2b_I32_I1_I32_I32_I1, null_frag, "v_subb_u32", 1>;


		rampitecUnsubmitted Done Reply Inline Actions It would be nice to swap these instructions' blocks back to minimize the diff. Order change is not needed for this patch. rampitec: It would be nice to swap these instructions' blocks back to minimize the diff. Order change is…
let SubtargetPredicate = HasAddNoCarryInsts in {		let SubtargetPredicate = HasAddNoCarryInsts in {
defm V_ADD_U32 : VOP2Inst <"v_add_u32", VOP_I32_I32_I32, null_frag, "v_add_u32", 1>;		defm V_ADD_U32 : VOP2Inst <"v_add_u32", VOP_I32_I32_I32, null_frag, "v_add_u32", 1>;
defm V_SUB_U32 : VOP2Inst <"v_sub_u32", VOP_I32_I32_I32, null_frag, "v_sub_u32", 1>;		defm V_SUB_U32 : VOP2Inst <"v_sub_u32", VOP_I32_I32_I32, null_frag, "v_sub_u32", 1>;
defm V_SUBREV_U32 : VOP2Inst <"v_subrev_u32", VOP_I32_I32_I32, null_frag, "v_sub_u32", 1>;		defm V_SUBREV_U32 : VOP2Inst <"v_subrev_u32", VOP_I32_I32_I32, null_frag, "v_sub_u32", 1>;
}		}

} // End isCommutable = 1		} // End isCommutable = 1

// These are special and do not read the exec mask.		// These are special and do not read the exec mask.
let isConvergent = 1, Uses = []<Register> in {		let isConvergent = 1, Uses = []<Register> in {
def V_READLANE_B32 : VOP2_Pseudo<"v_readlane_b32", VOP_READLANE,		def V_READLANE_B32 : VOP2_Pseudo<"v_readlane_b32", VOP_READLANE,
[(set i32:$vdst, (int_amdgcn_readlane i32:$src0, i32:$src1))], "">;		[(set i32:$vdst, (int_amdgcn_readlane i32:$src0, i32:$src1))]>;

let Constraints = "$vdst = $vdst_in", DisableEncoding="$vdst_in" in {		let Constraints = "$vdst = $vdst_in", DisableEncoding="$vdst_in" in {
def V_WRITELANE_B32 : VOP2_Pseudo<"v_writelane_b32", VOP_WRITELANE,		def V_WRITELANE_B32 : VOP2_Pseudo<"v_writelane_b32", VOP_WRITELANE,
[(set i32:$vdst, (int_amdgcn_writelane i32:$src0, i32:$src1, i32:$vdst_in))], "">;		[(set i32:$vdst, (int_amdgcn_writelane i32:$src0, i32:$src1, i32:$vdst_in))]>;
} // End $vdst = $vdst_in, DisableEncoding $vdst_in		} // End $vdst = $vdst_in, DisableEncoding $vdst_in
} // End isConvergent = 1		} // End isConvergent = 1

defm V_BFM_B32 : VOP2Inst <"v_bfm_b32", VOP_NO_EXT<VOP_I32_I32_I32>>;		defm V_BFM_B32 : VOP2Inst <"v_bfm_b32", VOP_NO_EXT<VOP_I32_I32_I32>>;
defm V_BCNT_U32_B32 : VOP2Inst <"v_bcnt_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>>;		defm V_BCNT_U32_B32 : VOP2Inst <"v_bcnt_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>>;
defm V_MBCNT_LO_U32_B32 : VOP2Inst <"v_mbcnt_lo_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>, int_amdgcn_mbcnt_lo>;		defm V_MBCNT_LO_U32_B32 : VOP2Inst <"v_mbcnt_lo_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>, int_amdgcn_mbcnt_lo>;
defm V_MBCNT_HI_U32_B32 : VOP2Inst <"v_mbcnt_hi_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>, int_amdgcn_mbcnt_hi>;		defm V_MBCNT_HI_U32_B32 : VOP2Inst <"v_mbcnt_hi_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>, int_amdgcn_mbcnt_hi>;
defm V_LDEXP_F32 : VOP2Inst <"v_ldexp_f32", VOP_NO_EXT<VOP_F32_F32_I32>, AMDGPUldexp>;		defm V_LDEXP_F32 : VOP2Inst <"v_ldexp_f32", VOP_NO_EXT<VOP_F32_F32_I32>, AMDGPUldexp>;
defm V_CVT_PKACCUM_U8_F32 : VOP2Inst <"v_cvt_pkaccum_u8_f32", VOP_NO_EXT<VOP_I32_F32_I32>>; // TODO: set "Uses = dst"		defm V_CVT_PKACCUM_U8_F32 : VOP2Inst <"v_cvt_pkaccum_u8_f32", VOP_NO_EXT<VOP_I32_F32_I32>>; // TODO: set "Uses = dst"
defm V_CVT_PKNORM_I16_F32 : VOP2Inst <"v_cvt_pknorm_i16_f32", VOP_NO_EXT<VOP_V2I16_F32_F32>, AMDGPUpknorm_i16_f32>;		defm V_CVT_PKNORM_I16_F32 : VOP2Inst <"v_cvt_pknorm_i16_f32", VOP_NO_EXT<VOP_V2I16_F32_F32>, AMDGPUpknorm_i16_f32>;
defm V_CVT_PKNORM_U16_F32 : VOP2Inst <"v_cvt_pknorm_u16_f32", VOP_NO_EXT<VOP_V2I16_F32_F32>, AMDGPUpknorm_u16_f32>;		defm V_CVT_PKNORM_U16_F32 : VOP2Inst <"v_cvt_pknorm_u16_f32", VOP_NO_EXT<VOP_V2I16_F32_F32>, AMDGPUpknorm_u16_f32>;
defm V_CVT_PKRTZ_F16_F32 : VOP2Inst <"v_cvt_pkrtz_f16_f32", VOP_NO_EXT<VOP_V2F16_F32_F32>, AMDGPUpkrtz_f16_f32>;		defm V_CVT_PKRTZ_F16_F32 : VOP2Inst <"v_cvt_pkrtz_f16_f32", VOP_NO_EXT<VOP_V2F16_F32_F32>, AMDGPUpkrtz_f16_f32>;
defm V_CVT_PK_U16_U32 : VOP2Inst <"v_cvt_pk_u16_u32", VOP_NO_EXT<VOP_V2I16_I32_I32>, AMDGPUpk_u16_u32>;		defm V_CVT_PK_U16_U32 : VOP2Inst <"v_cvt_pk_u16_u32", VOP_NO_EXT<VOP_V2I16_I32_I32>, AMDGPUpk_u16_u32>;
defm V_CVT_PK_I16_I32 : VOP2Inst <"v_cvt_pk_i16_i32", VOP_NO_EXT<VOP_V2I16_I32_I32>, AMDGPUpk_i16_i32>;		defm V_CVT_PK_I16_I32 : VOP2Inst <"v_cvt_pk_i16_i32", VOP_NO_EXT<VOP_V2I16_I32_I32>, AMDGPUpk_i16_i32>;

} // End SubtargetPredicate = isGCN		} // End SubtargetPredicate = isGCN, Predicates = [isGCN]

def : GCNPat<		def : GCNPat<
(AMDGPUadde i32:$src0, i32:$src1, i1:$src2),		(AMDGPUadde i32:$src0, i32:$src1, i1:$src2),
(V_ADDC_U32_e64 $src0, $src1, $src2)		(V_ADDC_U32_e64 $src0, $src1, $src2)
>;		>;

def : GCNPat<		def : GCNPat<
(AMDGPUsube i32:$src0, i32:$src1, i1:$src2),		(AMDGPUsube i32:$src0, i32:$src1, i1:$src2),
(V_SUBB_U32_e64 $src0, $src1, $src2)		(V_SUBB_U32_e64 $src0, $src1, $src2)
>;		>;

// These instructions only exist on SI and CI		// These instructions only exist on SI and CI
let SubtargetPredicate = isSICI in {		let SubtargetPredicate = isSICI, Predicates = [isSICI] in {

defm V_MIN_LEGACY_F32 : VOP2Inst <"v_min_legacy_f32", VOP_F32_F32_F32, AMDGPUfmin_legacy>;		defm V_MIN_LEGACY_F32 : VOP2Inst <"v_min_legacy_f32", VOP_F32_F32_F32, AMDGPUfmin_legacy>;
defm V_MAX_LEGACY_F32 : VOP2Inst <"v_max_legacy_f32", VOP_F32_F32_F32, AMDGPUfmax_legacy>;		defm V_MAX_LEGACY_F32 : VOP2Inst <"v_max_legacy_f32", VOP_F32_F32_F32, AMDGPUfmax_legacy>;

let isCommutable = 1 in {		let isCommutable = 1 in {
defm V_MAC_LEGACY_F32 : VOP2Inst <"v_mac_legacy_f32", VOP_F32_F32_F32>;		defm V_MAC_LEGACY_F32 : VOP2Inst <"v_mac_legacy_f32", VOP_F32_F32_F32>;
defm V_LSHR_B32 : VOP2Inst <"v_lshr_b32", VOP_I32_I32_I32>;		defm V_LSHR_B32 : VOP2Inst <"v_lshr_b32", VOP_PAT_GEN<VOP_I32_I32_I32>, srl>;
defm V_ASHR_I32 : VOP2Inst <"v_ashr_i32", VOP_I32_I32_I32>;		defm V_ASHR_I32 : VOP2Inst <"v_ashr_i32", VOP_PAT_GEN<VOP_I32_I32_I32>, sra>;
defm V_LSHL_B32 : VOP2Inst <"v_lshl_b32", VOP_I32_I32_I32>;		defm V_LSHL_B32 : VOP2Inst <"v_lshl_b32", VOP_PAT_GEN<VOP_I32_I32_I32>, shl>;
} // End isCommutable = 1		} // End isCommutable = 1

} // End let SubtargetPredicate = SICI		} // End let SubtargetPredicate = SICI, Predicates = [isSICI]

		class DivergentBinOp<SDPatternOperator Op, VOP_Pseudo Inst> :
		GCNPat<
		(getDivergentFrag<Op>.ret Inst.Pfl.Src0VT:$src0, Inst.Pfl.Src1VT:$src1),
		!if(!cast<Commutable_REV>(Inst).IsOrig,
		(Inst $src0, $src1),
		(Inst $src1, $src0)
		)
		>;

		let AddedComplexity = 1 in {
		def : DivergentBinOp<srl, V_LSHRREV_B32_e64>;
		def : DivergentBinOp<sra, V_ASHRREV_I32_e64>;
		def : DivergentBinOp<shl, V_LSHLREV_B32_e64>;
		}

		let SubtargetPredicate = HasAddNoCarryInsts in {
		def : DivergentBinOp<add, V_ADD_U32_e32>;
		def : DivergentBinOp<sub, V_SUB_U32_e32>;
		def : DivergentBinOp<sub, V_SUBREV_U32_e32>;
		}


		def : DivergentBinOp<add, V_ADD_I32_e32>;

		def : DivergentBinOp<add, V_ADD_I32_e64>;
		def : DivergentBinOp<sub, V_SUB_I32_e32>;

		def : DivergentBinOp<sub, V_SUBREV_I32_e32>;

		def : DivergentBinOp<srl, V_LSHRREV_B32_e32>;
		def : DivergentBinOp<sra, V_ASHRREV_I32_e32>;
		def : DivergentBinOp<shl, V_LSHLREV_B32_e32>;
		def : DivergentBinOp<adde, V_ADDC_U32_e32>;
		def : DivergentBinOp<sube, V_SUBB_U32_e32>;

		class divergent_i64_BinOp <SDPatternOperator Op, Instruction Inst> :
		rampitecUnsubmitted Done Reply Inline Actions Again, yhou probably do not need to move adde and sube patterns. rampitec: Again, yhou probably do not need to move adde and sube patterns.
		GCNPat<
		(getDivergentFrag<Op>.ret i64:$src0, i64:$src1),
		(REG_SEQUENCE VReg_64,
		(Inst
		(i32 (EXTRACT_SUBREG $src0, sub0)),
		(i32 (EXTRACT_SUBREG $src1, sub0))
		), sub0,
		(Inst
		(i32 (EXTRACT_SUBREG $src0, sub1)),
		(i32 (EXTRACT_SUBREG $src1, sub1))
		), sub1
		)
		>;
		rampitecUnsubmitted Not Done Reply Inline Actions Why is this reg_sequence and extract_subregs are needed? Why not just use i64 operands? rampitec: Why is this reg_sequence and extract_subregs are needed? Why not just use i64 operands?
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions How can I use i64 operands with V_AND_B32 that operates i32? alex-t: How can I use i64 operands with V_AND_B32 that operates i32?
		rampitecUnsubmitted Not Done Reply Inline Actions OK, thanks. rampitec: OK, thanks.

		def : divergent_i64_BinOp <and, V_AND_B32_e32>;
		def : divergent_i64_BinOp <or, V_OR_B32_e32>;
		def : divergent_i64_BinOp <xor, V_XOR_B32_e32>;

let SubtargetPredicate = Has16BitInsts in {		let SubtargetPredicate = Has16BitInsts in {

def V_MADMK_F16 : VOP2_Pseudo <"v_madmk_f16", VOP_MADMK_F16, [], "">;		def V_MADMK_F16 : VOP2_Pseudo <"v_madmk_f16", VOP_MADMK_F16, [], "">;
defm V_LSHLREV_B16 : VOP2Inst <"v_lshlrev_b16", VOP_I16_I16_I16>;		defm V_LSHLREV_B16 : VOP2Inst <"v_lshlrev_b16", VOP_I16_I16_I16>;
defm V_LSHRREV_B16 : VOP2Inst <"v_lshrrev_b16", VOP_I16_I16_I16>;		defm V_LSHRREV_B16 : VOP2Inst <"v_lshrrev_b16", VOP_I16_I16_I16>;
defm V_ASHRREV_I16 : VOP2Inst <"v_ashrrev_i16", VOP_I16_I16_I16>;		defm V_ASHRREV_I16 : VOP2Inst <"v_ashrrev_i16", VOP_I16_I16_I16>;
defm V_LDEXP_F16 : VOP2Inst <"v_ldexp_f16", VOP_F16_F16_I32, AMDGPUldexp>;		defm V_LDEXP_F16 : VOP2Inst <"v_ldexp_f16", VOP_F16_F16_I32, AMDGPUldexp>;
▲ Show 20 Lines • Show All 490 Lines • Show Last 20 Lines

lib/Target/AMDGPU/VOPInstructions.td

Show First 20 Lines • Show All 522 Lines • ▼ Show 20 Lines	class VOP_DPP <string OpName, VOPProfile P> :
let AssemblerPredicate = !if(P.HasExt, HasDPP, DisableInst);		let AssemblerPredicate = !if(P.HasExt, HasDPP, DisableInst);
let AsmVariantName = !if(P.HasExt, AMDGPUAsmVariants.DPP,		let AsmVariantName = !if(P.HasExt, AMDGPUAsmVariants.DPP,
AMDGPUAsmVariants.Disable);		AMDGPUAsmVariants.Disable);
let Constraints = !if(P.NumSrcArgs, "$old = $vdst", "");		let Constraints = !if(P.NumSrcArgs, "$old = $vdst", "");
let DisableEncoding = !if(P.NumSrcArgs, "$old", "");		let DisableEncoding = !if(P.NumSrcArgs, "$old", "");
let DecoderNamespace = "DPP";		let DecoderNamespace = "DPP";
}		}

		class getNumNodeArgs<SDPatternOperator Op> {
		SDNode N = !cast<SDNode>(Op);
		SDTypeProfile TP = N.TypeProfile;
		int ret = TP.NumOperands;
		}


		class getDivergentFrag<SDPatternOperator Op> {

		int NumSrcArgs = getNumNodeArgs<Op>.ret;
		PatFrag ret = PatFrag <
		!if(!eq(NumSrcArgs, 1),
		(ops node:$src0),
		!if(!eq(NumSrcArgs, 2),
		(ops node:$src0, node:$src1),
		(ops node:$src0, node:$src1, node:$src2))),
		!if(!eq(NumSrcArgs, 1),
		(Op $src0),
		!if(!eq(NumSrcArgs, 2),
		(Op $src0, $src1),
		(Op $src0, $src1, $src2))),
		[{ return N->isDivergent(); }]
		>;
		}

		class VOPPatGen<SDPatternOperator Op, VOPProfile P> {

		PatFrag Operator = getDivergentFrag < Op >.ret;

		dag Ins = !foreach(tmp, P.Ins32, !subst(ins, Operator,
		!subst(P.Src0RC32, P.Src0VT,
		!subst(P.Src1RC32, P.Src1VT, tmp))));


		dag Outs = !foreach(tmp, P.Outs32, !subst(outs, set,
		!subst(P.DstRC, P.DstVT, tmp)));

		list<dag> ret = [!con(Outs, (set Ins))];
		}

		class VOPPatOrNull<SDPatternOperator Op, VOPProfile P> {
		list<dag> ret = !if(!ne(P.NeedPatGen,PatGenMode.NoPattern), VOPPatGen<Op, P>.ret, []);
		}

include "VOPCInstructions.td"		include "VOPCInstructions.td"
include "VOP1Instructions.td"		include "VOP1Instructions.td"
		rampitecUnsubmitted Done Reply Inline Actions Where do you set NeedPatGen = 1? rampitec: Where do you set NeedPatGen = 1?
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions Окау. All the stuff related this flag is going to be used later on when I add the patterns for extended encoding forms. I agree that it should be removed from this patch. alex-t: Окау. All the stuff related this flag is going to be used later on when I add the patterns for…
include "VOP2Instructions.td"		include "VOP2Instructions.td"
include "VOP3Instructions.td"		include "VOP3Instructions.td"
include "VOP3PInstructions.td"		include "VOP3PInstructions.td"

test/CodeGen/AMDGPU/add.ll

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	entry:
%0 = add <16 x i32> %a, %b		%0 = add <16 x i32> %a, %b
store <16 x i32> %0, <16 x i32> addrspace(1)* %out		store <16 x i32> %0, <16 x i32> addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_add_i32:		; FUNC-LABEL: {{^}}v_add_i32:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
; SIVI: v_add_{{i\|u}}32_e32 v{{[0-9]+}}, vcc, [[B]], [[A]]		; SIVI: v_add_{{i\|u}}32_e32 v{{[0-9]+}}, vcc, [[A]], [[B]]
; GFX9: v_add_u32_e32 v{{[0-9]+}}, [[A]], [[B]]		; GFX9: v_add_u32_e32 v{{[0-9]+}}, [[A]], [[B]]
define amdgpu_kernel void @v_add_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) #0 {		define amdgpu_kernel void @v_add_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) #0 {
%tid = call i32 @llvm.r600.read.tidig.x()		%tid = call i32 @llvm.r600.read.tidig.x()
%gep = getelementptr inbounds i32, i32 addrspace(1)* %in, i32 %tid		%gep = getelementptr inbounds i32, i32 addrspace(1)* %in, i32 %tid
%b_ptr = getelementptr i32, i32 addrspace(1)* %gep, i32 1		%b_ptr = getelementptr i32, i32 addrspace(1)* %gep, i32 1
%a = load volatile i32, i32 addrspace(1)* %gep		%a = load volatile i32, i32 addrspace(1)* %gep
%b = load volatile i32, i32 addrspace(1)* %b_ptr		%b = load volatile i32, i32 addrspace(1)* %b_ptr
%result = add i32 %a, %b		%result = add i32 %a, %b
▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/amdgcn.private-memory.ll

	; RUN: llc -mattr=+promote-alloca -verify-machineinstrs -march=amdgcn < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-PROMOTE %s			; RUN: llc -mattr=+promote-alloca -verify-machineinstrs -march=amdgcn < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-PROMOTE %s
	; RUN: llc -mattr=+promote-alloca,-flat-for-global -verify-machineinstrs -mtriple=amdgcn--amdhsa -mcpu=kaveri < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-PROMOTE -check-prefix=HSA %s			; RUN: llc -mattr=+promote-alloca,-flat-for-global -verify-machineinstrs -mtriple=amdgcn--amdhsa -mcpu=kaveri < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-PROMOTE -check-prefix=HSA %s
	; RUN: llc -mattr=-promote-alloca -verify-machineinstrs -march=amdgcn < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-ALLOCA %s			; RUN: llc -mattr=-promote-alloca -verify-machineinstrs -march=amdgcn < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-ALLOCA %s
	; RUN: llc -mattr=-promote-alloca,-flat-for-global -verify-machineinstrs -mtriple=amdgcn-amdhsa -mcpu=kaveri < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-ALLOCA -check-prefix=HSA %s			; RUN: llc -mattr=-promote-alloca,-flat-for-global -verify-machineinstrs -mtriple=amdgcn-amdhsa -mcpu=kaveri < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-ALLOCA -check-prefix=HSA %s
	; RUN: llc -mattr=+promote-alloca -verify-machineinstrs -march=amdgcn -mcpu=tonga -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-PROMOTE %s			; RUN: llc -mattr=+promote-alloca -verify-machineinstrs -march=amdgcn -mcpu=tonga -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-PROMOTE %s
	; RUN: llc -mattr=-promote-alloca -verify-machineinstrs -march=amdgcn -mcpu=tonga -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-ALLOCA %s			; RUN: llc -mattr=-promote-alloca -verify-machineinstrs -march=amdgcn -mcpu=tonga -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-ALLOCA %s


	declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone


	; Make sure we don't overwrite workitem information with private memory			; Make sure we don't overwrite workitem information with private memory

	; GCN-LABEL: {{^}}work_item_info:			; GCN-LABEL: {{^}}work_item_info:
	; GCN-NOT: v0			; GCN-NOT: v0
	; GCN: v_add_{{[iu]}}32_e32 [[RESULT:v[0-9]+]], vcc, v0, v{{[0-9]+}}			; GCN: v_add_{{[iu]}}32_e32 [[RESULT:v[0-9]+]], vcc, v{{[0-9]+}}, v0
	; GCN: buffer_store_dword [[RESULT]]			; GCN: buffer_store_dword [[RESULT]]
	define amdgpu_kernel void @work_item_info(i32 addrspace(1)* %out, i32 %in) {			define amdgpu_kernel void @work_item_info(i32 addrspace(1)* %out, i32 %in) {
	entry:			entry:
	%0 = alloca [2 x i32], addrspace(5)			%0 = alloca [2 x i32], addrspace(5)
	%1 = getelementptr [2 x i32], [2 x i32] addrspace(5)* %0, i32 0, i32 0			%1 = getelementptr [2 x i32], [2 x i32] addrspace(5)* %0, i32 0, i32 0
	%2 = getelementptr [2 x i32], [2 x i32] addrspace(5)* %0, i32 0, i32 1			%2 = getelementptr [2 x i32], [2 x i32] addrspace(5)* %0, i32 0, i32 1
	store i32 0, i32 addrspace(5)* %1			store i32 0, i32 addrspace(5)* %1
	store i32 1, i32 addrspace(5)* %2			store i32 1, i32 addrspace(5)* %2
	%3 = getelementptr [2 x i32], [2 x i32] addrspace(5)* %0, i32 0, i32 %in			%3 = getelementptr [2 x i32], [2 x i32] addrspace(5)* %0, i32 0, i32 %in
	%4 = load i32, i32 addrspace(5)* %3			%4 = load i32, i32 addrspace(5)* %3
	%5 = call i32 @llvm.amdgcn.workitem.id.x()			%5 = call i32 @llvm.amdgcn.workitem.id.x()
	%6 = add i32 %4, %5			%6 = add i32 %4, %5
	store i32 %6, i32 addrspace(1)* %out			store i32 %6, i32 addrspace(1)* %out
	ret void			ret void
	}			}

test/CodeGen/AMDGPU/bfe-patterns.ll

Show All 18 Lines	define amdgpu_kernel void @v_ubfe_sub_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in0, i32 addrspace(1)* %in1) #1 {
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_ubfe_sub_multi_use_shl_i32:		; GCN-LABEL: {{^}}v_ubfe_sub_multi_use_shl_i32:
; GCN: {{buffer\|flat}}_load_dword [[SRC:v[0-9]+]]		; GCN: {{buffer\|flat}}_load_dword [[SRC:v[0-9]+]]
; GCN: {{buffer\|flat}}_load_dword [[WIDTH:v[0-9]+]]		; GCN: {{buffer\|flat}}_load_dword [[WIDTH:v[0-9]+]]
; GCN: v_sub_{{[iu]}}32_e32 [[SUB:v[0-9]+]], vcc, 32, [[WIDTH]]		; GCN: v_sub_{{[iu]}}32_e32 [[SUB:v[0-9]+]], vcc, 32, [[WIDTH]]

; SI-NEXT: v_lshl_b32_e32 [[SHL:v[0-9]+]], [[SRC]], [[SUB]]		; GCN-NEXT: v_lshlrev_b32_e32 [[SHL:v[0-9]+]], [[SUB]], [[SRC]]
; SI-NEXT: v_lshr_b32_e32 [[BFE:v[0-9]+]], [[SHL]], [[SUB]]		; GCN-NEXT: v_lshrrev_b32_e32 [[BFE:v[0-9]+]], [[SUB]], [[SHL]]

; VI-NEXT: v_lshlrev_b32_e32 [[SHL:v[0-9]+]], [[SUB]], [[SRC]]
; VI-NEXT: v_lshrrev_b32_e32 [[BFE:v[0-9]+]], [[SUB]], [[SHL]]

; GCN: [[BFE]]		; GCN: [[BFE]]
; GCN: [[SHL]]		; GCN: [[SHL]]
define amdgpu_kernel void @v_ubfe_sub_multi_use_shl_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in0, i32 addrspace(1)* %in1) #1 {		define amdgpu_kernel void @v_ubfe_sub_multi_use_shl_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in0, i32 addrspace(1)* %in1) #1 {
%id.x = tail call i32 @llvm.amdgcn.workitem.id.x()		%id.x = tail call i32 @llvm.amdgcn.workitem.id.x()
%in0.gep = getelementptr i32, i32 addrspace(1)* %in0, i32 %id.x		%in0.gep = getelementptr i32, i32 addrspace(1)* %in0, i32 %id.x
%in1.gep = getelementptr i32, i32 addrspace(1)* %in1, i32 %id.x		%in1.gep = getelementptr i32, i32 addrspace(1)* %in1, i32 %id.x
%out.gep = getelementptr i32, i32 addrspace(1)* %out, i32 %id.x		%out.gep = getelementptr i32, i32 addrspace(1)* %out, i32 %id.x
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_sbfe_sub_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in0, i32 addrspace(1)* %in1) #1 {
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_sbfe_sub_multi_use_shl_i32:		; GCN-LABEL: {{^}}v_sbfe_sub_multi_use_shl_i32:
; GCN: {{buffer\|flat}}_load_dword [[SRC:v[0-9]+]]		; GCN: {{buffer\|flat}}_load_dword [[SRC:v[0-9]+]]
; GCN: {{buffer\|flat}}_load_dword [[WIDTH:v[0-9]+]]		; GCN: {{buffer\|flat}}_load_dword [[WIDTH:v[0-9]+]]
; GCN: v_sub_{{[iu]}}32_e32 [[SUB:v[0-9]+]], vcc, 32, [[WIDTH]]		; GCN: v_sub_{{[iu]}}32_e32 [[SUB:v[0-9]+]], vcc, 32, [[WIDTH]]

; SI-NEXT: v_lshl_b32_e32 [[SHL:v[0-9]+]], [[SRC]], [[SUB]]		; GCN-NEXT: v_lshlrev_b32_e32 [[SHL:v[0-9]+]], [[SUB]], [[SRC]]
; SI-NEXT: v_ashr_i32_e32 [[BFE:v[0-9]+]], [[SHL]], [[SUB]]		; GCN-NEXT: v_ashrrev_i32_e32 [[BFE:v[0-9]+]], [[SUB]], [[SHL]]

; VI-NEXT: v_lshlrev_b32_e32 [[SHL:v[0-9]+]], [[SUB]], [[SRC]]
; VI-NEXT: v_ashrrev_i32_e32 [[BFE:v[0-9]+]], [[SUB]], [[SHL]]

; GCN: [[BFE]]		; GCN: [[BFE]]
; GCN: [[SHL]]		; GCN: [[SHL]]
define amdgpu_kernel void @v_sbfe_sub_multi_use_shl_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in0, i32 addrspace(1)* %in1) #1 {		define amdgpu_kernel void @v_sbfe_sub_multi_use_shl_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in0, i32 addrspace(1)* %in1) #1 {
%id.x = tail call i32 @llvm.amdgcn.workitem.id.x()		%id.x = tail call i32 @llvm.amdgcn.workitem.id.x()
%in0.gep = getelementptr i32, i32 addrspace(1)* %in0, i32 %id.x		%in0.gep = getelementptr i32, i32 addrspace(1)* %in0, i32 %id.x
%in1.gep = getelementptr i32, i32 addrspace(1)* %in1, i32 %id.x		%in1.gep = getelementptr i32, i32 addrspace(1)* %in1, i32 %id.x
%out.gep = getelementptr i32, i32 addrspace(1)* %out, i32 %id.x		%out.gep = getelementptr i32, i32 addrspace(1)* %out, i32 %id.x
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/ctpop64.ll

	Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	; VI: flat_load_dwordx4 v{{\[}}[[VAL0:[0-9]+]]:[[VAL3:[0-9]+]]{{\]}}, v{{\[[0-9]+:[0-9]+\]}}			; VI: flat_load_dwordx4 v{{\[}}[[VAL0:[0-9]+]]:[[VAL3:[0-9]+]]{{\]}}, v{{\[[0-9]+:[0-9]+\]}}

	; GCN-DAG: v_bcnt_u32_b32{{(_e64)*}} [[MIDRESULT0:v[0-9]+]], v{{[0-9]+}}, 0			; GCN-DAG: v_bcnt_u32_b32{{(_e64)*}} [[MIDRESULT0:v[0-9]+]], v{{[0-9]+}}, 0
	; GCN-DAG: v_bcnt_u32_b32{{(_e32)(_e64)}} [[MIDRESULT1:v[0-9]+]], v[[VAL3]], [[MIDRESULT0]]			; GCN-DAG: v_bcnt_u32_b32{{(_e32)(_e64)}} [[MIDRESULT1:v[0-9]+]], v[[VAL3]], [[MIDRESULT0]]

	; GCN-DAG: v_bcnt_u32_b32{{(_e64)*}} [[MIDRESULT2:v[0-9]+]], v[[VAL0]], 0			; GCN-DAG: v_bcnt_u32_b32{{(_e64)*}} [[MIDRESULT2:v[0-9]+]], v[[VAL0]], 0
	; GCN-DAG: v_bcnt_u32_b32{{(_e32)(_e64)}} [[MIDRESULT3:v[0-9]+]], v{{[0-9]+}}, [[MIDRESULT2]]			; GCN-DAG: v_bcnt_u32_b32{{(_e32)(_e64)}} [[MIDRESULT3:v[0-9]+]], v{{[0-9]+}}, [[MIDRESULT2]]

	; GCN: v_add_{{[iu]}}32_e32 [[RESULT:v[0-9]+]], vcc, [[MIDRESULT1]], [[MIDRESULT2]]			; GCN: v_add_{{[iu]}}32_e32 [[RESULT:v[0-9]+]], vcc, [[MIDRESULT2]], [[MIDRESULT1]]

	; GCN: buffer_store_dword [[RESULT]],			; GCN: buffer_store_dword [[RESULT]],
	; GCN: s_endpgm			; GCN: s_endpgm
	define amdgpu_kernel void @v_ctpop_i128(i32 addrspace(1)* noalias %out, i128 addrspace(1)* noalias %in) nounwind {			define amdgpu_kernel void @v_ctpop_i128(i32 addrspace(1)* noalias %out, i128 addrspace(1)* noalias %in) nounwind {
	%tid = call i32 @llvm.r600.read.tidig.x()			%tid = call i32 @llvm.r600.read.tidig.x()
	%in.gep = getelementptr i128, i128 addrspace(1)* %in, i32 %tid			%in.gep = getelementptr i128, i128 addrspace(1)* %in, i32 %tid
	%val = load i128, i128 addrspace(1)* %in.gep, align 8			%val = load i128, i128 addrspace(1)* %in.gep, align 8
	%ctpop = call i128 @llvm.ctpop.i128(i128 %val) nounwind readnone			%ctpop = call i128 @llvm.ctpop.i128(i128 %val) nounwind readnone
	%truncctpop = trunc i128 %ctpop to i32			%truncctpop = trunc i128 %ctpop to i32
	store i32 %truncctpop, i32 addrspace(1)* %out, align 4			store i32 %truncctpop, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}

test/CodeGen/AMDGPU/extract-lowbits.ll

	Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	}			}

	define i32 @bzhi32_d1_indexzext(i32 %val, i8 %numlowbits) nounwind {			define i32 @bzhi32_d1_indexzext(i32 %val, i8 %numlowbits) nounwind {
	; SI-LABEL: bzhi32_d1_indexzext:			; SI-LABEL: bzhi32_d1_indexzext:
	; SI: ; %bb.0:			; SI: ; %bb.0:
	; SI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SI-NEXT: v_sub_i32_e32 v1, vcc, 32, v1			; SI-NEXT: v_sub_i32_e32 v1, vcc, 32, v1
	; SI-NEXT: v_and_b32_e32 v1, 0xff, v1			; SI-NEXT: v_and_b32_e32 v1, 0xff, v1
	; SI-NEXT: v_lshl_b32_e32 v0, v0, v1			; SI-NEXT: v_lshlrev_b32_e32 v0, v1, v0
	; SI-NEXT: v_lshr_b32_e32 v0, v0, v1			; SI-NEXT: v_lshrrev_b32_e32 v0, v1, v0
	; SI-NEXT: s_setpc_b64 s[30:31]			; SI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; VI-LABEL: bzhi32_d1_indexzext:			; VI-LABEL: bzhi32_d1_indexzext:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; VI-NEXT: v_sub_u16_e32 v1, 32, v1			; VI-NEXT: v_sub_u16_e32 v1, 32, v1
	; VI-NEXT: v_and_b32_e32 v1, 0xff, v1			; VI-NEXT: v_and_b32_e32 v1, 0xff, v1
	; VI-NEXT: v_lshlrev_b32_e32 v0, v1, v0			; VI-NEXT: v_lshlrev_b32_e32 v0, v1, v0
	; VI-NEXT: v_lshrrev_b32_e32 v0, v1, v0			; VI-NEXT: v_lshrrev_b32_e32 v0, v1, v0
	; VI-NEXT: s_setpc_b64 s[30:31]			; VI-NEXT: s_setpc_b64 s[30:31]
	%numhighbits = sub i8 32, %numlowbits			%numhighbits = sub i8 32, %numlowbits
	%sh_prom = zext i8 %numhighbits to i32			%sh_prom = zext i8 %numhighbits to i32
	%highbitscleared = shl i32 %val, %sh_prom			%highbitscleared = shl i32 %val, %sh_prom
	%masked = lshr i32 %highbitscleared, %sh_prom			%masked = lshr i32 %highbitscleared, %sh_prom
	ret i32 %masked			ret i32 %masked
	}			}

test/CodeGen/AMDGPU/fabs.f16.ll

	Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	}			}

	; FIXME: Should do fabs after conversion to avoid converting multiple			; FIXME: Should do fabs after conversion to avoid converting multiple
	; times in this particular case.			; times in this particular case.

	; GCN-LABEL: {{^}}v_fabs_fold_self_v2f16:			; GCN-LABEL: {{^}}v_fabs_fold_self_v2f16:
	; GCN: {{flat\|global}}_load_dword [[VAL:v[0-9]+]]			; GCN: {{flat\|global}}_load_dword [[VAL:v[0-9]+]]

	; CI: v_lshrrev_b32_e32 v{{[0-9]+}}, 16, v{{[0-9]+}}			; CI: v_lshrrev_b32_e32 [[VREG:v[0-9]+]], 16, v{{[0-9]+}}
	; CI: v_lshrrev_b32_e32 v{{[0-9]+}}, 16, v{{[0-9]+}}			; CI: v_cvt_f32_f16_e32 [[NORM:v[0-9]+]], [[VREG]]
	; CI: v_cvt_f32_f16_e32			; CI: v_cvt_f32_f16_e64 [[ABS:v[0-9]+]], {{\\|}}[[VREG]]{{\\|}}
	; CI: v_cvt_f32_f16_e32			; CI: v_mul_f32_e32 v{{[0-9]+}}, [[ABS]], [[NORM]]
	; CI: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
	; CI: v_cvt_f16_f32			; CI: v_cvt_f16_f32
	; CI: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}			; CI: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
	; CI: v_cvt_f16_f32			; CI: v_cvt_f16_f32

	; VI: v_mul_f16_sdwa v{{[0-9]+}}, \|v{{[0-9]+}}\|, v{{[0-9]+}} dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1			; VI: v_mul_f16_sdwa v{{[0-9]+}}, \|v{{[0-9]+}}\|, v{{[0-9]+}} dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
	; VI: v_mul_f16_e64 v{{[0-9]+}}, \|v{{[0-9]+}}\|, v{{[0-9]+}}			; VI: v_mul_f16_e64 v{{[0-9]+}}, \|v{{[0-9]+}}\|, v{{[0-9]+}}

	; GFX9: v_and_b32_e32 [[FABS:v[0-9]+]], 0x7fff7fff, [[VAL]]			; GFX9: v_and_b32_e32 [[FABS:v[0-9]+]], 0x7fff7fff, [[VAL]]
	▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll

Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines
; GCN-DAG: {{flat\|global}}_load_dword [[VEC:v[0-9]+]]		; GCN-DAG: {{flat\|global}}_load_dword [[VEC:v[0-9]+]]

; GFX9-DAG: s_movk_i32 [[K:s[0-9]+]], 0x3e7		; GFX9-DAG: s_movk_i32 [[K:s[0-9]+]], 0x3e7
; GFX9-DAG: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]		; GFX9-DAG: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]
; GFX9: v_lshl_or_b32 [[RES:v[0-9]+]], [[K]], 16, [[ELT0]]		; GFX9: v_lshl_or_b32 [[RES:v[0-9]+]], [[K]], 16, [[ELT0]]

; CI: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, [[VEC]]		; CI: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, [[VEC]]
; CI: v_or_b32_e32 [[RES:v[0-9]+]], 0x3e70000, [[AND]]		; CI: v_or_b32_e32 [[RES:v[0-9]+]], 0x3e70000, [[AND]]
; VI: v_or_b32_sdwa [[RES:v[0-9]+]], [[VEC]], [[K]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; VI: v_or_b32_sdwa [[RES:v[0-9]+]], [[K]], [[VEC]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0

; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RES]]		; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RES]]
define amdgpu_kernel void @v_insertelement_v2i16_1(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) #0 {		define amdgpu_kernel void @v_insertelement_v2i16_1(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x() #1		%tid = call i32 @llvm.amdgcn.workitem.id.x() #1
%tid.ext = sext i32 %tid to i64		%tid.ext = sext i32 %tid to i64
%in.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %in, i64 %tid.ext		%in.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %in, i64 %tid.ext
%out.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %out, i64 %tid.ext		%out.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %out, i64 %tid.ext
%vec = load <2 x i16>, <2 x i16> addrspace(1)* %in.gep		%vec = load <2 x i16>, <2 x i16> addrspace(1)* %in.gep
%vecins = insertelement <2 x i16> %vec, i16 999, i32 1		%vecins = insertelement <2 x i16> %vec, i16 999, i32 1
store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out.gep		store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_insertelement_v2i16_1_inlineimm:		; GCN-LABEL: {{^}}v_insertelement_v2i16_1_inlineimm:
; VI: v_mov_b32_e32 [[K:v[0-9]+]], 0xfff10000		; VI: v_mov_b32_e32 [[K:v[0-9]+]], 0xfff10000
; GCN: {{flat\|global}}_load_dword [[VEC:v[0-9]+]]		; GCN: {{flat\|global}}_load_dword [[VEC:v[0-9]+]]
; CI: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]		; CI: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]
; GFX9: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]		; GFX9: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]
; CI: v_or_b32_e32 [[RES:v[0-9]+]], 0xfff10000, [[ELT0]]		; CI: v_or_b32_e32 [[RES:v[0-9]+]], 0xfff10000, [[ELT0]]
; VI: v_or_b32_sdwa [[RES:v[0-9]+]], [[VEC]], [[K]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; VI: v_or_b32_sdwa [[RES:v[0-9]+]], [[K]], [[VEC]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0
; GFX9: v_lshl_or_b32 [[RES:v[0-9]+]], -15, 16, [[ELT0]]		; GFX9: v_lshl_or_b32 [[RES:v[0-9]+]], -15, 16, [[ELT0]]
; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RES]]		; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RES]]
define amdgpu_kernel void @v_insertelement_v2i16_1_inlineimm(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) #0 {		define amdgpu_kernel void @v_insertelement_v2i16_1_inlineimm(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x() #1		%tid = call i32 @llvm.amdgcn.workitem.id.x() #1
%tid.ext = sext i32 %tid to i64		%tid.ext = sext i32 %tid to i64
%in.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %in, i64 %tid.ext		%in.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %in, i64 %tid.ext
%out.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %out, i64 %tid.ext		%out.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %out, i64 %tid.ext
%vec = load <2 x i16>, <2 x i16> addrspace(1)* %in.gep		%vec = load <2 x i16>, <2 x i16> addrspace(1)* %in.gep
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines

; GFX9-DAG: s_movk_i32 [[K:s[0-9]+]], 0x4500		; GFX9-DAG: s_movk_i32 [[K:s[0-9]+]], 0x4500
; GFX9-DAG: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]		; GFX9-DAG: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]
; GFX9: v_lshl_or_b32 [[RES:v[0-9]+]], [[K]], 16, [[ELT0]]		; GFX9: v_lshl_or_b32 [[RES:v[0-9]+]], [[K]], 16, [[ELT0]]

; CI: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, [[VEC]]		; CI: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, [[VEC]]
; CI: v_or_b32_e32 [[RES:v[0-9]+]], 0x45000000, [[AND]]		; CI: v_or_b32_e32 [[RES:v[0-9]+]], 0x45000000, [[AND]]

; VI: v_or_b32_sdwa [[RES:v[0-9]+]], [[VEC]], [[K]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; VI: v_or_b32_sdwa [[RES:v[0-9]+]], [[K]], [[VEC]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0

; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RES]]		; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RES]]
define amdgpu_kernel void @v_insertelement_v2f16_1(<2 x half> addrspace(1)* %out, <2 x half> addrspace(1)* %in) #0 {		define amdgpu_kernel void @v_insertelement_v2f16_1(<2 x half> addrspace(1)* %out, <2 x half> addrspace(1)* %in) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x() #1		%tid = call i32 @llvm.amdgcn.workitem.id.x() #1
%tid.ext = sext i32 %tid to i64		%tid.ext = sext i32 %tid to i64
%in.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %in, i64 %tid.ext		%in.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %in, i64 %tid.ext
%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext		%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext
%vec = load <2 x half>, <2 x half> addrspace(1)* %in.gep		%vec = load <2 x half>, <2 x half> addrspace(1)* %in.gep
%vecins = insertelement <2 x half> %vec, half 5.000000e+00, i32 1		%vecins = insertelement <2 x half> %vec, half 5.000000e+00, i32 1
store <2 x half> %vecins, <2 x half> addrspace(1)* %out.gep		store <2 x half> %vecins, <2 x half> addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_insertelement_v2f16_1_inlineimm:		; GCN-LABEL: {{^}}v_insertelement_v2f16_1_inlineimm:
; VI: v_mov_b32_e32 [[K:v[0-9]+]], 0x230000		; VI: v_mov_b32_e32 [[K:v[0-9]+]], 0x230000
; GCN: {{flat\|global}}_load_dword [[VEC:v[0-9]+]]		; GCN: {{flat\|global}}_load_dword [[VEC:v[0-9]+]]
; CI: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]		; CI: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]
; GFX9: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]		; GFX9: v_and_b32_e32 [[ELT0:v[0-9]+]], 0xffff, [[VEC]]
; CI: v_or_b32_e32 [[RES:v[0-9]+]], 0x230000, [[ELT0]]		; CI: v_or_b32_e32 [[RES:v[0-9]+]], 0x230000, [[ELT0]]
; VI: v_or_b32_sdwa [[RES:v[0-9]+]], [[VEC]], [[K]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; VI: v_or_b32_sdwa [[RES:v[0-9]+]], [[K]], [[VEC]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0
; GFX9: v_lshl_or_b32 [[RES:v[0-9]+]], 35, 16, [[ELT0]]		; GFX9: v_lshl_or_b32 [[RES:v[0-9]+]], 35, 16, [[ELT0]]
; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RES]]		; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RES]]
define amdgpu_kernel void @v_insertelement_v2f16_1_inlineimm(<2 x half> addrspace(1)* %out, <2 x half> addrspace(1)* %in) #0 {		define amdgpu_kernel void @v_insertelement_v2f16_1_inlineimm(<2 x half> addrspace(1)* %out, <2 x half> addrspace(1)* %in) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x() #1		%tid = call i32 @llvm.amdgcn.workitem.id.x() #1
%tid.ext = sext i32 %tid to i64		%tid.ext = sext i32 %tid to i64
%in.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %in, i64 %tid.ext		%in.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %in, i64 %tid.ext
%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext		%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext
%vec = load <2 x half>, <2 x half> addrspace(1)* %in.gep		%vec = load <2 x half>, <2 x half> addrspace(1)* %in.gep
Show All 37 Lines	define amdgpu_kernel void @v_insertelement_v2i16_dynamic_sgpr(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, i32 %idx) #0 {
%vecins = insertelement <2 x i16> %vec, i16 999, i32 %idx		%vecins = insertelement <2 x i16> %vec, i16 999, i32 %idx
store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out.gep		store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_insertelement_v2i16_dynamic_vgpr:		; GCN-LABEL: {{^}}v_insertelement_v2i16_dynamic_vgpr:
; GFX89-DAG: s_mov_b32 [[MASKK:s[0-9]+]], 0xffff{{$}}		; GFX89-DAG: s_mov_b32 [[MASKK:s[0-9]+]], 0xffff{{$}}
; GCN-DAG: s_movk_i32 [[K:s[0-9]+]], 0x3e7		; GCN-DAG: s_movk_i32 [[K:s[0-9]+]], 0x3e7

		rampitecUnsubmitted Done Reply Inline Actions It is the same, isn't it? Why not to keep it GCN-DAG? rampitec: It is the same, isn't it? Why not to keep it GCN-DAG?
; GCN: {{flat\|global}}_load_dword [[IDX:v[0-9]+]]		; GCN: {{flat\|global}}_load_dword [[IDX:v[0-9]+]]
; GCN: {{flat\|global}}_load_dword [[VEC:v[0-9]+]]		; GCN: {{flat\|global}}_load_dword [[VEC:v[0-9]+]]

; GFX89-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, [[IDX]]		; GFX89-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, [[IDX]]
; GFX89-DAG: v_lshlrev_b32_e64 [[MASK:v[0-9]+]], [[SCALED_IDX]], [[MASKK]]		; GFX89-DAG: v_lshlrev_b32_e64 [[MASK:v[0-9]+]], [[SCALED_IDX]], [[MASKK]]

; CI-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, [[IDX]]		; CI-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, [[IDX]]
; CI-DAG: v_lshl_b32_e32 [[MASK:v[0-9]+]], 0xffff, [[SCALED_IDX]]		; CI-DAG: v_lshl_b32_e32 [[MASK:v[0-9]+]], 0xffff, [[SCALED_IDX]]
Show All 11 Lines	define amdgpu_kernel void @v_insertelement_v2i16_dynamic_vgpr(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, i32 addrspace(1)* %idx.ptr) #0 {
%vecins = insertelement <2 x i16> %vec, i16 999, i32 %idx		%vecins = insertelement <2 x i16> %vec, i16 999, i32 %idx
store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out.gep		store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_insertelement_v2f16_dynamic_vgpr:		; GCN-LABEL: {{^}}v_insertelement_v2f16_dynamic_vgpr:
; GFX89-DAG: s_mov_b32 [[MASKK:s[0-9]+]], 0xffff{{$}}		; GFX89-DAG: s_mov_b32 [[MASKK:s[0-9]+]], 0xffff{{$}}
; GCN-DAG: s_movk_i32 [[K:s[0-9]+]], 0x1234		; GCN-DAG: s_movk_i32 [[K:s[0-9]+]], 0x1234

		rampitecUnsubmitted Done Reply Inline Actions Same here. rampitec: Same here.
; GCN: {{flat\|global}}_load_dword [[IDX:v[0-9]+]]		; GCN: {{flat\|global}}_load_dword [[IDX:v[0-9]+]]
; GCN: {{flat\|global}}_load_dword [[VEC:v[0-9]+]]		; GCN: {{flat\|global}}_load_dword [[VEC:v[0-9]+]]

; GFX89-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, [[IDX]]		; GFX89-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, [[IDX]]
; GFX89-DAG: v_lshlrev_b32_e64 [[MASK:v[0-9]+]], [[SCALED_IDX]], [[MASKK]]		; GFX89-DAG: v_lshlrev_b32_e64 [[MASK:v[0-9]+]], [[SCALED_IDX]], [[MASKK]]

; CI-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, [[IDX]]		; CI-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, [[IDX]]
; CI-DAG: v_lshl_b32_e32 [[MASK:v[0-9]+]], 0xffff, [[SCALED_IDX]]		; CI-DAG: v_lshl_b32_e32 [[MASK:v[0-9]+]], 0xffff, [[SCALED_IDX]]
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
; GCN-DAG: s_load_dword [[VAL:s[0-9]+]]		; GCN-DAG: s_load_dword [[VAL:s[0-9]+]]
; GCN-DAG: {{flat\|global}}_load_dwordx2 v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]{{\]}}		; GCN-DAG: {{flat\|global}}_load_dwordx2 v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]{{\]}}

; GFX9: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, v[[LO]]		; GFX9: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, v[[LO]]
; GFX9: v_lshl_or_b32 v[[INS_HALF:[0-9]+]], [[VAL]], 16, [[AND]]		; GFX9: v_lshl_or_b32 v[[INS_HALF:[0-9]+]], [[VAL]], 16, [[AND]]

; VI: s_lshl_b32 [[VAL_HI:s[0-9]+]], [[VAL]], 16		; VI: s_lshl_b32 [[VAL_HI:s[0-9]+]], [[VAL]], 16
; VI-DAG: v_mov_b32_e32 [[COPY_VAL:v[0-9]+]], [[VAL_HI]]		; VI-DAG: v_mov_b32_e32 [[COPY_VAL:v[0-9]+]], [[VAL_HI]]
; VI: v_or_b32_sdwa v[[INS_HALF:[0-9]+]], v[[LO]], [[COPY_VAL]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; VI: v_or_b32_sdwa v[[INS_HALF:[0-9]+]], [[COPY_VAL]], v[[LO]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0

; CI: s_lshl_b32 [[VAL_HI:s[0-9]+]], [[VAL]], 16		; CI: s_lshl_b32 [[VAL_HI:s[0-9]+]], [[VAL]], 16
; CI: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, v[[LO]]		; CI: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, v[[LO]]
; CI: v_or_b32_e32 v[[INS_HALF:[0-9]+]], [[VAL_HI]], [[AND]]		; CI: v_or_b32_e32 v[[INS_HALF:[0-9]+]], [[VAL_HI]], [[AND]]

; GCN: {{flat\|global}}_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[}}[[INS_HALF]]:[[HI]]{{\]}}		; GCN: {{flat\|global}}_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[}}[[INS_HALF]]:[[HI]]{{\]}}
define amdgpu_kernel void @v_insertelement_v4f16_1(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, i32 %val) #0 {		define amdgpu_kernel void @v_insertelement_v4f16_1(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, i32 %val) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x() #1		%tid = call i32 @llvm.amdgcn.workitem.id.x() #1
Show All 37 Lines
; GCN-DAG: s_load_dword [[VAL:s[0-9]+]]		; GCN-DAG: s_load_dword [[VAL:s[0-9]+]]
; GCN-DAG: {{flat\|global}}_load_dwordx2 v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]{{\]}}		; GCN-DAG: {{flat\|global}}_load_dwordx2 v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]{{\]}}

; GFX9: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, v[[HI]]		; GFX9: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, v[[HI]]
; GFX9: v_lshl_or_b32 v[[INS_HI:[0-9]+]], [[VAL]], 16, [[AND]]		; GFX9: v_lshl_or_b32 v[[INS_HI:[0-9]+]], [[VAL]], 16, [[AND]]

; VI: s_lshl_b32 [[VAL_HI:s[0-9]+]], [[VAL]], 16		; VI: s_lshl_b32 [[VAL_HI:s[0-9]+]], [[VAL]], 16
; VI-DAG: v_mov_b32_e32 [[COPY_VAL:v[0-9]+]], [[VAL_HI]]		; VI-DAG: v_mov_b32_e32 [[COPY_VAL:v[0-9]+]], [[VAL_HI]]
; VI: v_or_b32_sdwa v[[INS_HI:[0-9]+]], v[[HI]], [[COPY_VAL]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; VI: v_or_b32_sdwa v[[INS_HI:[0-9]+]], [[COPY_VAL]], v[[HI]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0

; CI: s_lshl_b32 [[VAL_HI:s[0-9]+]], [[VAL]], 16		; CI: s_lshl_b32 [[VAL_HI:s[0-9]+]], [[VAL]], 16
; CI: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, v[[HI]]		; CI: v_and_b32_e32 [[AND:v[0-9]+]], 0xffff, v[[HI]]
; CI: v_or_b32_e32 v[[INS_HI:[0-9]+]], [[VAL_HI]], [[AND]]		; CI: v_or_b32_e32 v[[INS_HI:[0-9]+]], [[VAL_HI]], [[AND]]

; GCN: {{flat\|global}}_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[}}[[LO]]:[[INS_HI]]{{\]}}		; GCN: {{flat\|global}}_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[}}[[LO]]:[[INS_HI]]{{\]}}
define amdgpu_kernel void @v_insertelement_v4f16_3(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, i32 %val) #0 {		define amdgpu_kernel void @v_insertelement_v4f16_3(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, i32 %val) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x() #1		%tid = call i32 @llvm.amdgcn.workitem.id.x() #1
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll

	Show All 9 Lines
	; VI: v_mov_b32_dpp v0, v1 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x02,0x00,0x7e,0x01,0x01,0x08,0x11]			; VI: v_mov_b32_dpp v0, v1 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x02,0x00,0x7e,0x01,0x01,0x08,0x11]
	define amdgpu_kernel void @dpp_test(i32 addrspace(1)* %out, i32 %in1, i32 %in2) {			define amdgpu_kernel void @dpp_test(i32 addrspace(1)* %out, i32 %in1, i32 %in2) {
	%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 1, i32 1, i32 1, i1 1) #0			%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 1, i32 1, i32 1, i1 1) #0
	store i32 %tmp0, i32 addrspace(1)* %out			store i32 %tmp0, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; VI-LABEL: {{^}}dpp_test1:			; VI-LABEL: {{^}}dpp_test1:
	; VI-OPT: v_add_u32_e32 [[REG:v[0-9]+]], vcc, v{{[0-9]+}}, v{{[0-9]+}}			; VI: v_add_u32_e32 [[REG:v[0-9]+]], vcc, v{{[0-9]+}}, v{{[0-9]+}}
	; VI-NOOPT: v_mov_b32_e32 v{{[0-9]+}}, 0			; VI-NOOPT: v_mov_b32_e32 v{{[0-9]+}}, 0
	; VI-NOOPT: v_mov_b32_e32 [[REG:v[0-9]+]], v{{[0-9]+}}
	; VI-NEXT: s_nop 0			; VI-NEXT: s_nop 0
				rampitecUnsubmitted Done Reply Inline Actions This must be a leftover from some experiments, there is no such check. rampitec: This must be a leftover from some experiments, there is no such check.
	; VI-NEXT: s_nop 0			; VI-NEXT: s_nop 0
	; VI-NEXT: v_mov_b32_dpp v2, [[REG]] quad_perm:[1,0,3,2] row_mask:0xf bank_mask:0xf			; VI-NEXT: v_mov_b32_dpp v2, [[REG]] quad_perm:[1,0,3,2] row_mask:0xf bank_mask:0xf
	@0 = internal unnamed_addr addrspace(3) global [448 x i32] undef, align 4			@0 = internal unnamed_addr addrspace(3) global [448 x i32] undef, align 4
	define weak_odr amdgpu_kernel void @dpp_test1(i32* %arg) local_unnamed_addr {			define weak_odr amdgpu_kernel void @dpp_test1(i32* %arg) local_unnamed_addr {
	bb:			bb:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%tmp1 = zext i32 %tmp to i64			%tmp1 = zext i32 %tmp to i64
	%tmp2 = getelementptr inbounds [448 x i32], [448 x i32] addrspace(3)* @0, i32 0, i32 %tmp			%tmp2 = getelementptr inbounds [448 x i32], [448 x i32] addrspace(3)* @0, i32 0, i32 %tmp
	Show All 17 Lines

test/CodeGen/AMDGPU/shift-i128.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	define i128 @v_shl_i128_vv(i128 %lhs, i128 %rhs) {			define i128 @v_shl_i128_vv(i128 %lhs, i128 %rhs) {
	; GCN-LABEL: v_shl_i128_vv:			; GCN-LABEL: v_shl_i128_vv:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_lshl_b64 v[5:6], v[0:1], v4			; GCN-NEXT: v_lshl_b64 v[5:6], v[2:3], v4
	; GCN-NEXT: v_lshl_b64 v[7:8], v[2:3], v4
	; GCN-NEXT: v_sub_i32_e32 v9, vcc, 64, v4			; GCN-NEXT: v_sub_i32_e32 v9, vcc, 64, v4
	; GCN-NEXT: v_subrev_i32_e32 v11, vcc, 64, v4			; GCN-NEXT: v_subrev_i32_e32 v11, vcc, 64, v4
	; GCN-NEXT: v_cmp_gt_u32_e32 vcc, 64, v4			; GCN-NEXT: v_lshl_b64 v[7:8], v[0:1], v4
	; GCN-NEXT: v_cndmask_b32_e32 v6, 0, v6, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v5, 0, v5, vcc
	; GCN-NEXT: v_lshr_b64 v[9:10], v[0:1], v9			; GCN-NEXT: v_lshr_b64 v[9:10], v[0:1], v9
	; GCN-NEXT: v_lshl_b64 v[0:1], v[0:1], v11			; GCN-NEXT: v_lshl_b64 v[0:1], v[0:1], v11
	; GCN-NEXT: v_or_b32_e32 v7, v7, v9			; GCN-NEXT: v_or_b32_e32 v6, v6, v10
	; GCN-NEXT: v_or_b32_e32 v8, v8, v10			; GCN-NEXT: v_or_b32_e32 v5, v5, v9
	; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v8, vcc			; GCN-NEXT: v_cmp_gt_u32_e32 vcc, 64, v4
	; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v7, vcc			; GCN-NEXT: v_cndmask_b32_e32 v6, v1, v6, vcc
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4			; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v5, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v3, v1, v3, vcc			; GCN-NEXT: v_cndmask_b32_e32 v1, 0, v8, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v2, v0, v2, vcc			; GCN-NEXT: v_cmp_eq_u32_e64 s[6:7], 0, v4
	; GCN-NEXT: v_mov_b32_e32 v0, v5			; GCN-NEXT: v_cndmask_b32_e64 v3, v6, v3, s[6:7]
	; GCN-NEXT: v_mov_b32_e32 v1, v6			; GCN-NEXT: v_cndmask_b32_e64 v2, v0, v2, s[6:7]
				; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v7, vcc
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
				rampitecUnsubmitted Done Reply Inline Actions Was it done with update_llc_test_checks.py? It is strange to see indents changed. I suspect the file was really changed manually instead. rampitec: Was it done with update_llc_test_checks.py? It is strange to see indents changed. I suspect the…
	%shl = shl i128 %lhs, %rhs			%shl = shl i128 %lhs, %rhs
	ret i128 %shl			ret i128 %shl
	}			}

	define i128 @v_lshr_i128_vv(i128 %lhs, i128 %rhs) {			define i128 @v_lshr_i128_vv(i128 %lhs, i128 %rhs) {
	; GCN-LABEL: v_lshr_i128_vv:			; GCN-LABEL: v_lshr_i128_vv:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_lshr_b64 v[5:6], v[2:3], v4			; GCN-NEXT: v_lshr_b64 v[5:6], v[0:1], v4
	; GCN-NEXT: v_lshr_b64 v[7:8], v[0:1], v4
	; GCN-NEXT: v_sub_i32_e32 v9, vcc, 64, v4			; GCN-NEXT: v_sub_i32_e32 v9, vcc, 64, v4
	; GCN-NEXT: v_subrev_i32_e32 v11, vcc, 64, v4			; GCN-NEXT: v_subrev_i32_e32 v11, vcc, 64, v4
	; GCN-NEXT: v_cmp_gt_u32_e32 vcc, 64, v4			; GCN-NEXT: v_lshr_b64 v[7:8], v[2:3], v4
	; GCN-NEXT: v_cndmask_b32_e32 v6, 0, v6, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v5, 0, v5, vcc
	; GCN-NEXT: v_lshl_b64 v[9:10], v[2:3], v9			; GCN-NEXT: v_lshl_b64 v[9:10], v[2:3], v9
	; GCN-NEXT: v_lshr_b64 v[2:3], v[2:3], v11			; GCN-NEXT: v_lshr_b64 v[2:3], v[2:3], v11
	; GCN-NEXT: v_or_b32_e32 v7, v7, v9			; GCN-NEXT: v_or_b32_e32 v6, v6, v10
	; GCN-NEXT: v_or_b32_e32 v8, v8, v10			; GCN-NEXT: v_or_b32_e32 v5, v5, v9
	; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v8, vcc			; GCN-NEXT: v_cmp_gt_u32_e32 vcc, 64, v4
	; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v7, vcc			; GCN-NEXT: v_cndmask_b32_e32 v6, v3, v6, vcc
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4			; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v5, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v1, v3, v1, vcc			; GCN-NEXT: v_cndmask_b32_e32 v3, 0, v8, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc			; GCN-NEXT: v_cmp_eq_u32_e64 s[6:7], 0, v4
	; GCN-NEXT: v_mov_b32_e32 v2, v5			; GCN-NEXT: v_cndmask_b32_e64 v1, v6, v1, s[6:7]
	; GCN-NEXT: v_mov_b32_e32 v3, v6			; GCN-NEXT: v_cndmask_b32_e64 v0, v2, v0, s[6:7]
				; GCN-NEXT: v_cndmask_b32_e32 v2, 0, v7, vcc
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]

	%shl = lshr i128 %lhs, %rhs			%shl = lshr i128 %lhs, %rhs
	ret i128 %shl			ret i128 %shl
	}			}

	define i128 @v_ashr_i128_vv(i128 %lhs, i128 %rhs) {			define i128 @v_ashr_i128_vv(i128 %lhs, i128 %rhs) {
	; GCN-LABEL: v_ashr_i128_vv:			; GCN-LABEL: v_ashr_i128_vv:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_ashrrev_i32_e32 v9, 31, v3			; GCN-NEXT: v_ashrrev_i32_e32 v9, 31, v3
	; GCN-NEXT: v_ashr_i64 v[5:6], v[2:3], v4			; GCN-NEXT: v_ashr_i64 v[5:6], v[2:3], v4
	; GCN-NEXT: v_lshr_b64 v[7:8], v[0:1], v4			; GCN-NEXT: v_lshr_b64 v[7:8], v[0:1], v4
	; GCN-NEXT: v_sub_i32_e32 v10, vcc, 64, v4			; GCN-NEXT: v_sub_i32_e32 v10, vcc, 64, v4
	; GCN-NEXT: v_subrev_i32_e32 v11, vcc, 64, v4			; GCN-NEXT: v_subrev_i32_e32 v11, vcc, 64, v4
	; GCN-NEXT: v_cmp_gt_u32_e32 vcc, 64, v4			; GCN-NEXT: v_cmp_gt_u32_e32 vcc, 64, v4
	; GCN-NEXT: v_cndmask_b32_e32 v6, v9, v6, vcc			; GCN-NEXT: v_cndmask_b32_e32 v6, v9, v6, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v5, v9, v5, vcc			; GCN-NEXT: v_cndmask_b32_e32 v5, v9, v5, vcc
	; GCN-NEXT: v_lshl_b64 v[9:10], v[2:3], v10			; GCN-NEXT: v_lshl_b64 v[9:10], v[2:3], v10
	; GCN-NEXT: v_ashr_i64 v[2:3], v[2:3], v11			; GCN-NEXT: v_ashr_i64 v[2:3], v[2:3], v11
	; GCN-NEXT: v_or_b32_e32 v7, v7, v9
	; GCN-NEXT: v_or_b32_e32 v8, v8, v10			; GCN-NEXT: v_or_b32_e32 v8, v8, v10
				; GCN-NEXT: v_or_b32_e32 v7, v7, v9
	; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v8, vcc			; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v8, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v7, vcc			; GCN-NEXT: v_cndmask_b32_e32 v2, v2, v7, vcc
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4			; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4
	; GCN-NEXT: v_cndmask_b32_e32 v1, v3, v1, vcc			; GCN-NEXT: v_cndmask_b32_e32 v1, v3, v1, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc			; GCN-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc
	; GCN-NEXT: v_mov_b32_e32 v2, v5			; GCN-NEXT: v_mov_b32_e32 v2, v5
	; GCN-NEXT: v_mov_b32_e32 v3, v6			; GCN-NEXT: v_mov_b32_e32 v3, v6
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	▲ Show 20 Lines • Show All 410 Lines • ▼ Show 20 Lines

	define <2 x i128> @v_shl_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {			define <2 x i128> @v_shl_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
	; GCN-LABEL: v_shl_v2i128_vv:			; GCN-LABEL: v_shl_v2i128_vv:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_lshl_b64 v[16:17], v[2:3], v8			; GCN-NEXT: v_lshl_b64 v[16:17], v[2:3], v8
	; GCN-NEXT: v_sub_i32_e32 v18, vcc, 64, v8			; GCN-NEXT: v_sub_i32_e32 v18, vcc, 64, v8
	; GCN-NEXT: v_lshr_b64 v[18:19], v[0:1], v18			; GCN-NEXT: v_lshr_b64 v[18:19], v[0:1], v18
	; GCN-NEXT: v_or_b32_e32 v20, v16, v18			; GCN-NEXT: v_or_b32_e32 v20, v17, v19
	; GCN-NEXT: v_or_b32_e32 v21, v17, v19			; GCN-NEXT: v_or_b32_e32 v21, v16, v18
	; GCN-NEXT: v_sub_i32_e32 v16, vcc, 64, v12			; GCN-NEXT: v_sub_i32_e32 v16, vcc, 64, v12
	; GCN-NEXT: v_lshr_b64 v[16:17], v[4:5], v16			; GCN-NEXT: v_lshr_b64 v[16:17], v[4:5], v16
	; GCN-NEXT: v_lshl_b64 v[18:19], v[6:7], v12			; GCN-NEXT: v_lshl_b64 v[18:19], v[6:7], v12
	; GCN-NEXT: v_or_b32_e32 v16, v18, v16
	; GCN-NEXT: v_or_b32_e32 v17, v19, v17			; GCN-NEXT: v_or_b32_e32 v17, v19, v17
				; GCN-NEXT: v_or_b32_e32 v16, v18, v16
	; GCN-NEXT: v_cmp_eq_u64_e64 s[6:7], 0, v[10:11]			; GCN-NEXT: v_cmp_eq_u64_e64 s[6:7], 0, v[10:11]
	; GCN-NEXT: v_or_b32_e32 v10, v8, v10
	; GCN-NEXT: v_or_b32_e32 v11, v9, v11			; GCN-NEXT: v_or_b32_e32 v11, v9, v11
				; GCN-NEXT: v_or_b32_e32 v10, v8, v10
	; GCN-NEXT: v_cmp_eq_u64_e64 s[8:9], 0, v[14:15]			; GCN-NEXT: v_cmp_eq_u64_e64 s[8:9], 0, v[14:15]
	; GCN-NEXT: v_or_b32_e32 v14, v12, v14
	; GCN-NEXT: v_or_b32_e32 v15, v13, v15			; GCN-NEXT: v_or_b32_e32 v15, v13, v15
				; GCN-NEXT: v_or_b32_e32 v14, v12, v14
	; GCN-NEXT: v_cmp_gt_u64_e64 s[10:11], 64, v[8:9]			; GCN-NEXT: v_cmp_gt_u64_e64 s[10:11], 64, v[8:9]
	; GCN-NEXT: v_subrev_i32_e32 v18, vcc, 64, v8			; GCN-NEXT: v_subrev_i32_e32 v18, vcc, 64, v8
	; GCN-NEXT: v_lshl_b64 v[8:9], v[0:1], v8			; GCN-NEXT: v_lshl_b64 v[8:9], v[0:1], v8
	; GCN-NEXT: v_lshl_b64 v[0:1], v[0:1], v18			; GCN-NEXT: v_lshl_b64 v[0:1], v[0:1], v18
	; GCN-NEXT: v_cmp_gt_u64_e64 s[12:13], 64, v[12:13]			; GCN-NEXT: v_cmp_gt_u64_e64 s[12:13], 64, v[12:13]
	; GCN-NEXT: v_subrev_i32_e32 v18, vcc, 64, v12			; GCN-NEXT: v_subrev_i32_e32 v18, vcc, 64, v12
	; GCN-NEXT: v_lshl_b64 v[12:13], v[4:5], v12			; GCN-NEXT: v_lshl_b64 v[12:13], v[4:5], v12
	; GCN-NEXT: v_lshl_b64 v[4:5], v[4:5], v18			; GCN-NEXT: v_lshl_b64 v[4:5], v[4:5], v18
	; GCN-NEXT: s_and_b64 vcc, s[6:7], s[10:11]			; GCN-NEXT: s_and_b64 vcc, s[6:7], s[10:11]
	; GCN-NEXT: v_cndmask_b32_e32 v18, v1, v21, vcc			; GCN-NEXT: v_cndmask_b32_e32 v18, v1, v20, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v19, v0, v20, vcc			; GCN-NEXT: v_cndmask_b32_e32 v19, v0, v21, vcc
	; GCN-NEXT: s_and_b64 s[6:7], s[8:9], s[12:13]			; GCN-NEXT: s_and_b64 s[6:7], s[8:9], s[12:13]
	; GCN-NEXT: v_cndmask_b32_e64 v17, v5, v17, s[6:7]			; GCN-NEXT: v_cndmask_b32_e64 v17, v5, v17, s[6:7]
	; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v16, s[6:7]			; GCN-NEXT: v_cndmask_b32_e64 v4, v4, v16, s[6:7]
	; GCN-NEXT: v_cndmask_b32_e32 v1, 0, v9, vcc			; GCN-NEXT: v_cndmask_b32_e32 v1, 0, v9, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v8, vcc			; GCN-NEXT: v_cndmask_b32_e32 v0, 0, v8, vcc
	; GCN-NEXT: v_cndmask_b32_e64 v5, 0, v13, s[6:7]			; GCN-NEXT: v_cndmask_b32_e64 v5, 0, v13, s[6:7]
	; GCN-NEXT: v_cmp_eq_u64_e32 vcc, 0, v[10:11]			; GCN-NEXT: v_cmp_eq_u64_e32 vcc, 0, v[10:11]
	; GCN-NEXT: v_cndmask_b32_e32 v3, v18, v3, vcc			; GCN-NEXT: v_cndmask_b32_e32 v3, v18, v3, vcc
	Show All 9 Lines

	define <2 x i128> @v_lshr_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {			define <2 x i128> @v_lshr_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
	; GCN-LABEL: v_lshr_v2i128_vv:			; GCN-LABEL: v_lshr_v2i128_vv:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_lshr_b64 v[16:17], v[0:1], v8			; GCN-NEXT: v_lshr_b64 v[16:17], v[0:1], v8
	; GCN-NEXT: v_sub_i32_e32 v18, vcc, 64, v8			; GCN-NEXT: v_sub_i32_e32 v18, vcc, 64, v8
	; GCN-NEXT: v_lshl_b64 v[18:19], v[2:3], v18			; GCN-NEXT: v_lshl_b64 v[18:19], v[2:3], v18
	; GCN-NEXT: v_or_b32_e32 v20, v16, v18			; GCN-NEXT: v_or_b32_e32 v20, v17, v19
	; GCN-NEXT: v_or_b32_e32 v21, v17, v19			; GCN-NEXT: v_or_b32_e32 v21, v16, v18
	; GCN-NEXT: v_sub_i32_e32 v16, vcc, 64, v12			; GCN-NEXT: v_sub_i32_e32 v16, vcc, 64, v12
	; GCN-NEXT: v_lshl_b64 v[16:17], v[6:7], v16			; GCN-NEXT: v_lshl_b64 v[16:17], v[6:7], v16
	; GCN-NEXT: v_lshr_b64 v[18:19], v[4:5], v12			; GCN-NEXT: v_lshr_b64 v[18:19], v[4:5], v12
	; GCN-NEXT: v_or_b32_e32 v16, v18, v16
	; GCN-NEXT: v_or_b32_e32 v17, v19, v17			; GCN-NEXT: v_or_b32_e32 v17, v19, v17
				; GCN-NEXT: v_or_b32_e32 v16, v18, v16
	; GCN-NEXT: v_cmp_eq_u64_e64 s[6:7], 0, v[10:11]			; GCN-NEXT: v_cmp_eq_u64_e64 s[6:7], 0, v[10:11]
	; GCN-NEXT: v_or_b32_e32 v10, v8, v10
	; GCN-NEXT: v_or_b32_e32 v11, v9, v11			; GCN-NEXT: v_or_b32_e32 v11, v9, v11
				; GCN-NEXT: v_or_b32_e32 v10, v8, v10
	; GCN-NEXT: v_cmp_eq_u64_e64 s[8:9], 0, v[14:15]			; GCN-NEXT: v_cmp_eq_u64_e64 s[8:9], 0, v[14:15]
	; GCN-NEXT: v_or_b32_e32 v14, v12, v14
	; GCN-NEXT: v_or_b32_e32 v15, v13, v15			; GCN-NEXT: v_or_b32_e32 v15, v13, v15
				; GCN-NEXT: v_or_b32_e32 v14, v12, v14
	; GCN-NEXT: v_cmp_gt_u64_e64 s[10:11], 64, v[8:9]			; GCN-NEXT: v_cmp_gt_u64_e64 s[10:11], 64, v[8:9]
	; GCN-NEXT: v_subrev_i32_e32 v18, vcc, 64, v8			; GCN-NEXT: v_subrev_i32_e32 v18, vcc, 64, v8
	; GCN-NEXT: v_lshr_b64 v[8:9], v[2:3], v8			; GCN-NEXT: v_lshr_b64 v[8:9], v[2:3], v8
	; GCN-NEXT: v_lshr_b64 v[2:3], v[2:3], v18			; GCN-NEXT: v_lshr_b64 v[2:3], v[2:3], v18
	; GCN-NEXT: v_cmp_gt_u64_e64 s[12:13], 64, v[12:13]			; GCN-NEXT: v_cmp_gt_u64_e64 s[12:13], 64, v[12:13]
	; GCN-NEXT: v_subrev_i32_e32 v18, vcc, 64, v12			; GCN-NEXT: v_subrev_i32_e32 v18, vcc, 64, v12
	; GCN-NEXT: v_lshr_b64 v[12:13], v[6:7], v12			; GCN-NEXT: v_lshr_b64 v[12:13], v[6:7], v12
	; GCN-NEXT: v_lshr_b64 v[6:7], v[6:7], v18			; GCN-NEXT: v_lshr_b64 v[6:7], v[6:7], v18
	; GCN-NEXT: s_and_b64 vcc, s[6:7], s[10:11]			; GCN-NEXT: s_and_b64 vcc, s[6:7], s[10:11]
	; GCN-NEXT: v_cndmask_b32_e32 v18, v3, v21, vcc			; GCN-NEXT: v_cndmask_b32_e32 v18, v3, v20, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v19, v2, v20, vcc			; GCN-NEXT: v_cndmask_b32_e32 v19, v2, v21, vcc
	; GCN-NEXT: s_and_b64 s[6:7], s[8:9], s[12:13]			; GCN-NEXT: s_and_b64 s[6:7], s[8:9], s[12:13]
	; GCN-NEXT: v_cndmask_b32_e64 v17, v7, v17, s[6:7]			; GCN-NEXT: v_cndmask_b32_e64 v17, v7, v17, s[6:7]
	; GCN-NEXT: v_cndmask_b32_e64 v6, v6, v16, s[6:7]			; GCN-NEXT: v_cndmask_b32_e64 v6, v6, v16, s[6:7]
	; GCN-NEXT: v_cndmask_b32_e32 v3, 0, v9, vcc			; GCN-NEXT: v_cndmask_b32_e32 v3, 0, v9, vcc
	; GCN-NEXT: v_cndmask_b32_e32 v2, 0, v8, vcc			; GCN-NEXT: v_cndmask_b32_e32 v2, 0, v8, vcc
	; GCN-NEXT: v_cndmask_b32_e64 v7, 0, v13, s[6:7]			; GCN-NEXT: v_cndmask_b32_e64 v7, 0, v13, s[6:7]
	; GCN-NEXT: v_cmp_eq_u64_e32 vcc, 0, v[10:11]			; GCN-NEXT: v_cmp_eq_u64_e32 vcc, 0, v[10:11]
	; GCN-NEXT: v_cndmask_b32_e32 v1, v18, v1, vcc			; GCN-NEXT: v_cndmask_b32_e32 v1, v18, v1, vcc
	Show All 9 Lines

	define <2 x i128> @v_ashr_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {			define <2 x i128> @v_ashr_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
	; GCN-LABEL: v_ashr_v2i128_vv:			; GCN-LABEL: v_ashr_v2i128_vv:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_lshr_b64 v[16:17], v[0:1], v8			; GCN-NEXT: v_lshr_b64 v[16:17], v[0:1], v8
	; GCN-NEXT: v_sub_i32_e32 v18, vcc, 64, v8			; GCN-NEXT: v_sub_i32_e32 v18, vcc, 64, v8
	; GCN-NEXT: v_lshl_b64 v[18:19], v[2:3], v18			; GCN-NEXT: v_lshl_b64 v[18:19], v[2:3], v18
	; GCN-NEXT: v_or_b32_e32 v20, v16, v18			; GCN-NEXT: v_or_b32_e32 v20, v17, v19
	; GCN-NEXT: v_or_b32_e32 v21, v17, v19			; GCN-NEXT: v_or_b32_e32 v21, v16, v18
	; GCN-NEXT: v_sub_i32_e32 v16, vcc, 64, v12			; GCN-NEXT: v_sub_i32_e32 v16, vcc, 64, v12
	; GCN-NEXT: v_lshl_b64 v[16:17], v[6:7], v16			; GCN-NEXT: v_lshl_b64 v[16:17], v[6:7], v16
	; GCN-NEXT: v_lshr_b64 v[18:19], v[4:5], v12			; GCN-NEXT: v_lshr_b64 v[18:19], v[4:5], v12
	; GCN-NEXT: v_or_b32_e32 v18, v18, v16
	; GCN-NEXT: v_or_b32_e32 v19, v19, v17			; GCN-NEXT: v_or_b32_e32 v19, v19, v17
				; GCN-NEXT: v_or_b32_e32 v18, v18, v16
	; GCN-NEXT: v_cmp_eq_u64_e64 s[6:7], 0, v[10:11]			; GCN-NEXT: v_cmp_eq_u64_e64 s[6:7], 0, v[10:11]
	; GCN-NEXT: v_or_b32_e32 v10, v8, v10
	; GCN-NEXT: v_or_b32_e32 v11, v9, v11			; GCN-NEXT: v_or_b32_e32 v11, v9, v11
				; GCN-NEXT: v_or_b32_e32 v10, v8, v10
	; GCN-NEXT: v_cmp_eq_u64_e64 s[8:9], 0, v[14:15]			; GCN-NEXT: v_cmp_eq_u64_e64 s[8:9], 0, v[14:15]
	; GCN-NEXT: v_or_b32_e32 v14, v12, v14
	; GCN-NEXT: v_or_b32_e32 v15, v13, v15			; GCN-NEXT: v_or_b32_e32 v15, v13, v15
				; GCN-NEXT: v_or_b32_e32 v14, v12, v14
	; GCN-NEXT: v_cmp_gt_u64_e64 s[10:11], 64, v[8:9]			; GCN-NEXT: v_cmp_gt_u64_e64 s[10:11], 64, v[8:9]
	; GCN-NEXT: v_subrev_i32_e32 v9, vcc, 64, v8			; GCN-NEXT: v_subrev_i32_e32 v9, vcc, 64, v8
	; GCN-NEXT: v_ashr_i64 v[16:17], v[2:3], v9			; GCN-NEXT: v_ashr_i64 v[16:17], v[2:3], v9
	; GCN-NEXT: s_and_b64 s[6:7], s[6:7], s[10:11]			; GCN-NEXT: s_and_b64 s[6:7], s[6:7], s[10:11]
	; GCN-NEXT: v_cndmask_b32_e64 v17, v17, v21, s[6:7]			; GCN-NEXT: v_cndmask_b32_e64 v17, v17, v20, s[6:7]
	; GCN-NEXT: v_cndmask_b32_e64 v16, v16, v20, s[6:7]			; GCN-NEXT: v_cndmask_b32_e64 v16, v16, v21, s[6:7]
	; GCN-NEXT: v_cmp_gt_u64_e64 s[10:11], 64, v[12:13]			; GCN-NEXT: v_cmp_gt_u64_e64 s[10:11], 64, v[12:13]
	; GCN-NEXT: v_ashr_i64 v[8:9], v[2:3], v8			; GCN-NEXT: v_ashr_i64 v[8:9], v[2:3], v8
	; GCN-NEXT: v_ashrrev_i32_e32 v20, 31, v3			; GCN-NEXT: v_ashrrev_i32_e32 v20, 31, v3
	; GCN-NEXT: v_subrev_i32_e32 v2, vcc, 64, v12			; GCN-NEXT: v_subrev_i32_e32 v2, vcc, 64, v12
	; GCN-NEXT: v_ashr_i64 v[12:13], v[6:7], v12			; GCN-NEXT: v_ashr_i64 v[12:13], v[6:7], v12
	; GCN-NEXT: v_ashrrev_i32_e32 v21, 31, v7			; GCN-NEXT: v_ashrrev_i32_e32 v21, 31, v7
	; GCN-NEXT: v_ashr_i64 v[2:3], v[6:7], v2			; GCN-NEXT: v_ashr_i64 v[2:3], v[6:7], v2
	; GCN-NEXT: s_and_b64 vcc, s[8:9], s[10:11]			; GCN-NEXT: s_and_b64 vcc, s[8:9], s[10:11]
	▲ Show 20 Lines • Show All 423 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/shl.v2i16.ll

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; VI: v_lshlrev_b16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}			; VI: v_lshlrev_b16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
	; VI: v_lshlrev_b16_sdwa v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}} dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1			; VI: v_lshlrev_b16_sdwa v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}} dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
	; VI: v_or_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}			; VI: v_or_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}

	; CI: s_mov_b32 [[MASK:s[0-9]+]], 0xffff{{$}}			; CI: s_mov_b32 [[MASK:s[0-9]+]], 0xffff{{$}}
	; CI: v_lshrrev_b32_e32 v{{[0-9]+}}, 16, [[LHS]]			; CI: v_lshrrev_b32_e32 v{{[0-9]+}}, 16, [[LHS]]
	; CI: v_lshrrev_b32_e32 v{{[0-9]+}}, 16, v{{[0-9]+}}			; CI: v_lshrrev_b32_e32 v{{[0-9]+}}, 16, v{{[0-9]+}}
	; CI: v_lshlrev_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}			; CI: v_lshlrev_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
	; CI: v_lshl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}			; CI: v_lshlrev_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
	; CI: v_lshlrev_b32_e32 v{{[0-9]+}}, 16, v{{[0-9]+}}			; CI: v_lshlrev_b32_e32 v{{[0-9]+}}, 16, v{{[0-9]+}}
	; CI: v_and_b32_e32 v{{[0-9]+}}, [[MASK]], v{{[0-9]+}}			; CI: v_and_b32_e32 v{{[0-9]+}}, [[MASK]], v{{[0-9]+}}
	; CI: v_or_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}			; CI: v_or_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
	define amdgpu_kernel void @v_shl_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) #0 {			define amdgpu_kernel void @v_shl_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.ext = sext i32 %tid to i64			%tid.ext = sext i32 %tid to i64
	%in.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %in, i64 %tid.ext			%in.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %in, i64 %tid.ext
	%out.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %out, i64 %tid.ext			%out.gep = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %out, i64 %tid.ext
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/sibling-call.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -mattr=-flat-for-global -enable-ipra=0 -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI,CIVI,MESA %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -mattr=-flat-for-global -enable-ipra=0 -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI,CIVI,MESA %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI,CIVI,MESA %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI,CIVI,MESA %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-flat-for-global -enable-ipra=0 -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,MESA %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-flat-for-global -enable-ipra=0 -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,MESA %s
target datalayout = "A5"		target datalayout = "A5"

; FIXME: Why is this commuted only sometimes?		; FIXME: Why is this commuted only sometimes?
; GCN-LABEL: {{^}}i32_fastcc_i32_i32:		; GCN-LABEL: {{^}}i32_fastcc_i32_i32:
; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CIVI-NEXT: v_add_{{i\|u}}32_e32 v0, vcc, v1, v0		; CIVI-NEXT: v_add_{{i\|u}}32_e32 v0, vcc, v0, v1
; GFX9-NEXT: v_add_u32_e32 v0, v0, v1		; GFX9-NEXT: v_add_u32_e32 v0, v0, v1
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define fastcc i32 @i32_fastcc_i32_i32(i32 %arg0, i32 %arg1) #1 {		define fastcc i32 @i32_fastcc_i32_i32(i32 %arg0, i32 %arg1) #1 {
%add0 = add i32 %arg0, %arg1		%add0 = add i32 %arg0, %arg1
ret i32 %add0		ret i32 %add0
}		}

; GCN-LABEL: {{^}}i32_fastcc_i32_i32_stack_object:		; GCN-LABEL: {{^}}i32_fastcc_i32_i32_stack_object:
; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CIVI-NEXT: v_add_{{i\|u}}32_e32 v0, vcc, v1, v0		; CIVI-NEXT: v_add_{{i\|u}}32_e32 v0, vcc, v0, v1
; GFX9-NEXT: v_add_u32_e32 v0, v0, v1		; GFX9-NEXT: v_add_u32_e32 v0, v0, v1
; GCN: s_mov_b32 s5, s32		; GCN: s_mov_b32 s5, s32
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s5 offset:24		; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s5 offset:24
; GCN: s_waitcnt vmcnt(0)		; GCN: s_waitcnt vmcnt(0)
; GCN: s_setpc_b64		; GCN: s_setpc_b64
; GCN: ; ScratchSize: 68		; GCN: ; ScratchSize: 68
define fastcc i32 @i32_fastcc_i32_i32_stack_object(i32 %arg0, i32 %arg1) #1 {		define fastcc i32 @i32_fastcc_i32_i32_stack_object(i32 %arg0, i32 %arg1) #1 {
%alloca = alloca [16 x i32], align 4, addrspace(5)		%alloca = alloca [16 x i32], align 4, addrspace(5)
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
}		}

; GCN-LABEL: {{^}}i32_fastcc_i32_byval_i32:		; GCN-LABEL: {{^}}i32_fastcc_i32_byval_i32:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_mov_b32 s5, s32		; GCN-NEXT: s_mov_b32 s5, s32
; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s5 offset:4		; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s5 offset:4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)

; CIVI-NEXT: v_add_{{i\|u}}32_e32 v0, vcc, v1, v0		; CIVI-NEXT: v_add_{{i\|u}}32_e32 v0, vcc, v0, v1
; GFX9-NEXT: v_add_u32_e32 v0, v0, v1		; GFX9-NEXT: v_add_u32_e32 v0, v0, v1

; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
define fastcc i32 @i32_fastcc_i32_byval_i32(i32 %arg0, i32 addrspace(5)* byval align 4 %arg1) #1 {		define fastcc i32 @i32_fastcc_i32_byval_i32(i32 %arg0, i32 addrspace(5)* byval align 4 %arg1) #1 {
%arg1.load = load i32, i32 addrspace(5)* %arg1, align 4		%arg1.load = load i32, i32 addrspace(5)* %arg1, align 4
%add0 = add i32 %arg0, %arg1.load		%add0 = add i32 %arg0, %arg1.load
ret i32 %add0		ret i32 %add0
}		}
Show All 25 Lines	entry:
ret i32 %ret		ret i32 %ret
}		}

; GCN-LABEL: {{^}}i32_fastcc_i32_i32_a32i32:		; GCN-LABEL: {{^}}i32_fastcc_i32_i32_a32i32:
; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-DAG: buffer_load_dword [[LOAD_0:v[0-9]+]], off, s[0:3], s5 offset:4		; GCN-DAG: buffer_load_dword [[LOAD_0:v[0-9]+]], off, s[0:3], s5 offset:4
; GCN-DAG: buffer_load_dword [[LOAD_1:v[0-9]+]], off, s[0:3], s5 offset:8		; GCN-DAG: buffer_load_dword [[LOAD_1:v[0-9]+]], off, s[0:3], s5 offset:8

; CIVI-NEXT: v_add_{{i\|u}}32_e32 v0, vcc, v1, v0		; CIVI-NEXT: v_add_{{i\|u}}32_e32 v0, vcc, v0, v1
; CIVI: v_add_{{i\|u}}32_e32 v0, vcc, [[LOAD_0]], v0		; CIVI: v_add_{{i\|u}}32_e32 v0, vcc, v0, [[LOAD_0]]
; CIVI: v_add_{{i\|u}}32_e32 v0, vcc, [[LOAD_1]], v0		; CIVI: v_add_{{i\|u}}32_e32 v0, vcc, v0, [[LOAD_1]]


; GFX9-NEXT: v_add_u32_e32 v0, v0, v1		; GFX9-NEXT: v_add_u32_e32 v0, v0, v1
; GFX9: v_add_u32_e32 v0, v0, [[LOAD_0]]		; GFX9: v_add_u32_e32 v0, v0, [[LOAD_0]]
; GFX9: v_add_u32_e32 v0, v0, [[LOAD_1]]		; GFX9: v_add_u32_e32 v0, v0, [[LOAD_1]]

; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %arg0, i32 %arg1, [32 x i32] %large) #1 {		define fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %arg0, i32 %arg1, [32 x i32] %large) #1 {
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/sub.ll

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_sub_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
%a = load <4 x i32>, <4 x i32> addrspace(1) * %in		%a = load <4 x i32>, <4 x i32> addrspace(1) * %in
%b = load <4 x i32>, <4 x i32> addrspace(1) * %b_ptr		%b = load <4 x i32>, <4 x i32> addrspace(1) * %b_ptr
%result = sub <4 x i32> %a, %b		%result = sub <4 x i32> %a, %b
store <4 x i32> %result, <4 x i32> addrspace(1)* %out		store <4 x i32> %result, <4 x i32> addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_sub_i16:		; FUNC-LABEL: {{^}}test_sub_i16:
; SI: v_subrev_i32_e32 v{{[0-9]+}}, vcc,		; SI: v_sub_i32_e32 v{{[0-9]+}}, vcc,
; GFX89: v_sub_u16_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}		; GFX89: v_sub_u16_e32 v{{[0-9]+, v[0-9]+, v[0-9]+}}
define amdgpu_kernel void @test_sub_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %in) {		define amdgpu_kernel void @test_sub_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %in) {
%tid = call i32 @llvm.r600.read.tidig.x()		%tid = call i32 @llvm.r600.read.tidig.x()
%gep = getelementptr i16, i16 addrspace(1)* %in, i32 %tid		%gep = getelementptr i16, i16 addrspace(1)* %in, i32 %tid
%b_ptr = getelementptr i16, i16 addrspace(1)* %gep, i32 1		%b_ptr = getelementptr i16, i16 addrspace(1)* %gep, i32 1
%a = load volatile i16, i16 addrspace(1)* %gep		%a = load volatile i16, i16 addrspace(1)* %gep
%b = load volatile i16, i16 addrspace(1)* %b_ptr		%b = load volatile i16, i16 addrspace(1)* %b_ptr
%result = sub i16 %a, %b		%result = sub i16 %a, %b
▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Divergence driven instruction selection. Part 1.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 166260

lib/Target/AMDGPU/AMDGPU.td

lib/Target/AMDGPU/SIInstrInfo.td

lib/Target/AMDGPU/SIInstructions.td

lib/Target/AMDGPU/SOPInstructions.td

lib/Target/AMDGPU/VOP2Instructions.td

lib/Target/AMDGPU/VOPInstructions.td

test/CodeGen/AMDGPU/add.ll

test/CodeGen/AMDGPU/amdgcn.private-memory.ll

test/CodeGen/AMDGPU/bfe-patterns.ll

test/CodeGen/AMDGPU/ctpop64.ll

test/CodeGen/AMDGPU/extract-lowbits.ll

test/CodeGen/AMDGPU/fabs.f16.ll

test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll

test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll

test/CodeGen/AMDGPU/shift-i128.ll

test/CodeGen/AMDGPU/shl.v2i16.ll

test/CodeGen/AMDGPU/sibling-call.ll

test/CodeGen/AMDGPU/sub.ll

[AMDGPU] Divergence driven instruction selection. Part 1.
ClosedPublic