This is an archive of the discontinued LLVM Phabricator instance.

Differential D20796

[AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC.
ClosedPublic

Authored by artem.tamazov on May 30 2016, 11:07 AM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm
SamWot

Commits

rG135487767b36: [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when…
rL271900: [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when…

Summary

The change unifies llvm assembler/disassembler syntax with sp3's one.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.

Diff Detail

Repository: rL LLVM

Event Timeline

artem.tamazov updated this revision to Diff 58980.May 30 2016, 11:07 AM

artem.tamazov retitled this revision from to [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC..

artem.tamazov updated this object.

artem.tamazov added reviewers: arsenm, • tstellarAMD, SamWot.

artem.tamazov set the repository for this revision to rL LLVM.

artem.tamazov added a project: Restricted Project.

artem.tamazov added subscribers: Restricted Project, vpykhtin, nhaustov.

Herald added subscribers: kzhuravl, arsenm. · View Herald TranscriptMay 30 2016, 11:07 AM

SamWot accepted this revision.May 31 2016, 2:50 AM

SamWot edited edge metadata.

This revision is now accepted and ready to land.May 31 2016, 2:50 AM

Ping

Closed by commit rL271900: [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when… (authored by artem.tamazov). · Explain WhyJun 6 2016, 8:30 AM

This revision was automatically updated to reflect the committed changes.

Why did this change codegen at all?

In D20796#449776, @arsenm wrote:

Why did this change codegen at all?

The CodeGen itself is not changed, but its output is.

Prior this change VOP3 was enforced when src2 is specified, even if src2 is VCC. The change allows for 32-bit encoding in that case.

arsenm added inline comments.Jun 6 2016, 1:14 PM

llvm/trunk/test/CodeGen/AMDGPU/fceil64.ll
28	Where did the s_and_b64 go?
llvm/trunk/test/CodeGen/AMDGPU/sint_to_fp.i64.ll
28	This changes looks like it regressed to now use e64

artem.tamazov added inline comments.Jun 7 2016, 4:06 AM

llvm/trunk/test/CodeGen/AMDGPU/fceil64.ll
28	There was superfluous s_and_b64. Output of fceil_f64 contained only one s_and_b64 even prior this change. The issue somehow became apparent after my changes.
llvm/trunk/test/CodeGen/AMDGPU/sint_to_fp.i64.ll
28	Yes, this is a case of regression. However, overall stats for this test looks progressed: Function codeSize nSgps nVprs (prior/after change) ---------------------------------------------- s_sint_to_fp_i64_to_f32 152/148 14/14 4/4 v_sint_to_fp_i64_to_f32 180/188 12/14 7/6 s_sint_to_fp_v2i64 296/284 17/17 8/8 v_sint_to_fp_v4i64 656/624 15/17 21/18 ---------------------------------------------- Total diff -40 +4 -4 For fceil64.ll, stats progressed as well: codeSize nSgps nVprs Function (prior/after change) ----------------------------------------------- fceil_f64 132/128 12/12 4/4 fceil_v2f64 248/240 16/16 7/7 fceil_v4f64 480/464 20/20 11/11 fceil_v8f64 916/876 27/27 21/20 fceil_v16f64 1996/1988 62/62 46/46 ----------------------------------------------- Total diff -76 0 -1 As overall stats looks good, I decided that no further investigation necessary right away. Regarding v_sint_to_fp_i64_to_f32 regression. After looking at ISA changes, I suspect that instruction scheduler prefers to save a VGPR at the cost of code size. I can send you both ISAs if you would like to look at this.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AMDGPU/

SIInstrInfo.td

78 lines

SIInstructions.td

12 lines

test/

CodeGen/

AMDGPU/

fceil64.ll

3 lines

sint_to_fp.i64.ll

2 lines

MC/

AMDGPU/

vop2-err.s

6 lines

vop2.s

7 lines

vop3.s

8 lines

Disassembler/

AMDGPU/

vop2_vi.txt

3 lines

vop3_vi.txt

3 lines

Diff 59727

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.td

Show First 20 Lines • Show All 1,469 Lines • ▼ Show 20 Lines	def VOP2b_I32_I1_I32_I32_I1 : VOPProfile<[i32, i32, i32, i1]> {
let Outs32 = (outs DstRC:$vdst);		let Outs32 = (outs DstRC:$vdst);
let Outs64 = (outs DstRC:$vdst, SReg_64:$sdst);		let Outs64 = (outs DstRC:$vdst, SReg_64:$sdst);

// Suppress src2 implied by type since the 32-bit encoding uses an		// Suppress src2 implied by type since the 32-bit encoding uses an
// implicit VCC use.		// implicit VCC use.
let Ins32 = (ins Src0RC32:$src0, Src1RC32:$src1);		let Ins32 = (ins Src0RC32:$src0, Src1RC32:$src1);
}		}

		// Read in from vcc or arbitrary SGPR
		def VOP2e_I32_I32_I32_I1 : VOPProfile<[i32, i32, i32, i1]> {
		let Src0RC32 = VCSrc_32; // See comment in def VOP2b_I32_I1_I32_I32_I1 above.
		let Asm32 = "$vdst, $src0, $src1, vcc";
		let Asm64 = "$vdst, $src0, $src1, $src2";
		let Outs32 = (outs DstRC:$vdst);
		let Outs64 = (outs DstRC:$vdst);

		// Suppress src2 implied by type since the 32-bit encoding uses an
		// implicit VCC use.
		let Ins32 = (ins Src0RC32:$src0, Src1RC32:$src1);
		}

class VOP3b_Profile<ValueType vt> : VOPProfile<[vt, vt, vt, vt]> {		class VOP3b_Profile<ValueType vt> : VOPProfile<[vt, vt, vt, vt]> {
let Outs64 = (outs DstRC:$vdst, SReg_64:$sdst);		let Outs64 = (outs DstRC:$vdst, SReg_64:$sdst);
let Asm64 = "$vdst, $sdst, $src0_modifiers, $src1_modifiers, $src2_modifiers"#"$clamp"#"$omod";		let Asm64 = "$vdst, $sdst, $src0_modifiers, $src1_modifiers, $src2_modifiers"#"$clamp"#"$omod";
}		}

def VOP3b_F32_I1_F32_F32_F32 : VOP3b_Profile<f32> {		def VOP3b_F32_I1_F32_F32_F32 : VOP3b_Profile<f32> {
// FIXME: Hack to stop printing _e64		// FIXME: Hack to stop printing _e64
let DstRC = RegisterOperand<VGPR_32>;		let DstRC = RegisterOperand<VGPR_32>;
Show All 25 Lines
def VOPC_I1_I64_I64 : VOPC_Profile<i64>;		def VOPC_I1_I64_I64 : VOPC_Profile<i64>;

def VOPC_I1_F32_I32 : VOPC_Class_Profile<f32>;		def VOPC_I1_F32_I32 : VOPC_Class_Profile<f32>;
def VOPC_I1_F64_I32 : VOPC_Class_Profile<f64>;		def VOPC_I1_F64_I32 : VOPC_Class_Profile<f64>;

def VOP_I64_I64_I32 : VOPProfile <[i64, i64, i32, untyped]>;		def VOP_I64_I64_I32 : VOPProfile <[i64, i64, i32, untyped]>;
def VOP_I64_I32_I64 : VOPProfile <[i64, i32, i64, untyped]>;		def VOP_I64_I32_I64 : VOPProfile <[i64, i32, i64, untyped]>;
def VOP_I64_I64_I64 : VOPProfile <[i64, i64, i64, untyped]>;		def VOP_I64_I64_I64 : VOPProfile <[i64, i64, i64, untyped]>;
def VOP_CNDMASK : VOPProfile <[i32, i32, i32, untyped]> {
let Ins32 = (ins Src0RC32:$src0, Src1RC32:$src1);
let Ins64 = (ins Src0RC64:$src0, Src1RC64:$src1, SSrc_64:$src2);
let Asm64 = "$vdst, $src0, $src1, $src2";
}

def VOP_F32_F32_F32_F32 : VOPProfile <[f32, f32, f32, f32]>;		def VOP_F32_F32_F32_F32 : VOPProfile <[f32, f32, f32, f32]>;
def VOP_MADAK : VOPProfile <[f32, f32, f32, f32]> {		def VOP_MADAK : VOPProfile <[f32, f32, f32, f32]> {
field dag Ins32 = (ins VCSrc_32:$src0, VGPR_32:$src1, u32imm:$imm);		field dag Ins32 = (ins VCSrc_32:$src0, VGPR_32:$src1, u32imm:$imm);
field string Asm32 = "$vdst, $src0, $src1, $imm";		field string Asm32 = "$vdst, $src0, $src1, $imm";
field bit HasExt = 0;		field bit HasExt = 0;
}		}
def VOP_MADMK : VOPProfile <[f32, f32, f32, f32]> {		def VOP_MADMK : VOPProfile <[f32, f32, f32, f32]> {
▲ Show 20 Lines • Show All 326 Lines • ▼ Show 20 Lines	class VOP3b_Real_vi <bits<10> op, dag outs, dag ins, string asm, string opName,
VOP3Common <outs, ins, asm, [], HasMods, VOP3Only>,		VOP3Common <outs, ins, asm, [], HasMods, VOP3Only>,
VOP3be_vi <op>,		VOP3be_vi <op>,
SIMCInstr <opName#"_e64", SISubtarget.VI> {		SIMCInstr <opName#"_e64", SISubtarget.VI> {
let AssemblerPredicates = [isVI];		let AssemblerPredicates = [isVI];
let DecoderNamespace = "VI";		let DecoderNamespace = "VI";
let DisableDecoder = DisableVIDecoder;		let DisableDecoder = DisableVIDecoder;
}		}

		class VOP3e_Real_si <bits<9> op, dag outs, dag ins, string asm, string opName,
		bit HasMods = 0, bit VOP3Only = 0> :
		VOP3Common <outs, ins, asm, [], HasMods, VOP3Only>,
		VOP3e <op>,
		SIMCInstr<opName#"_e64", SISubtarget.SI> {
		let AssemblerPredicates = [isSICI];
		let DecoderNamespace = "SICI";
		let DisableDecoder = DisableSIDecoder;
		}

		class VOP3e_Real_vi <bits<10> op, dag outs, dag ins, string asm, string opName,
		bit HasMods = 0, bit VOP3Only = 0> :
		VOP3Common <outs, ins, asm, [], HasMods, VOP3Only>,
		VOP3e_vi <op>,
		SIMCInstr <opName#"_e64", SISubtarget.VI> {
		let AssemblerPredicates = [isVI];
		let DecoderNamespace = "VI";
		let DisableDecoder = DisableVIDecoder;
		}

multiclass VOP3_m <vop op, dag outs, dag ins, string asm, list<dag> pattern,		multiclass VOP3_m <vop op, dag outs, dag ins, string asm, list<dag> pattern,
string opName, int NumSrcArgs, bit HasMods = 1, bit VOP3Only = 0> {		string opName, int NumSrcArgs, bit HasMods = 1, bit VOP3Only = 0> {

def "" : VOP3_Pseudo <outs, ins, pattern, opName>;		def "" : VOP3_Pseudo <outs, ins, pattern, opName>;

def _si : VOP3_Real_si <op.SI3, outs, ins, asm, opName, HasMods, VOP3Only>,		def _si : VOP3_Real_si <op.SI3, outs, ins, asm, opName, HasMods, VOP3Only>,
VOP3DisableFields<!if(!eq(NumSrcArgs, 1), 0, 1),		VOP3DisableFields<!if(!eq(NumSrcArgs, 1), 0, 1),
!if(!eq(NumSrcArgs, 2), 0, 1),		!if(!eq(NumSrcArgs, 2), 0, 1),
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	multiclass VOP3b_2_3_m <vop op, dag outs, dag ins, string asm,

def _si : VOP3b_Real_si <op.SI3, outs, ins, asm, opName, HasMods, VOP3Only>,		def _si : VOP3b_Real_si <op.SI3, outs, ins, asm, opName, HasMods, VOP3Only>,
VOP3DisableFields<1, useSrc2Input, HasMods>;		VOP3DisableFields<1, useSrc2Input, HasMods>;

def _vi : VOP3b_Real_vi <op.VI3, outs, ins, asm, opName, HasMods, VOP3Only>,		def _vi : VOP3b_Real_vi <op.VI3, outs, ins, asm, opName, HasMods, VOP3Only>,
VOP3DisableFields<1, useSrc2Input, HasMods>;		VOP3DisableFields<1, useSrc2Input, HasMods>;
}		}

		// Same as VOP3b_2_3_m but no 2nd destination (sdst), e.g. v_cndmask_b32.
		multiclass VOP3e_2_3_m <vop op, dag outs, dag ins, string asm,
		list<dag> pattern, string opName, string revOp,
		bit HasMods = 1, bit useSrc2Input = 0, bit VOP3Only = 0> {
		def "" : VOP3_Pseudo <outs, ins, pattern, opName, HasMods, VOP3Only>;

		def _si : VOP3e_Real_si <op.SI3, outs, ins, asm, opName, HasMods, VOP3Only>,
		VOP3DisableFields<1, useSrc2Input, HasMods>;

		def _vi : VOP3e_Real_vi <op.VI3, outs, ins, asm, opName, HasMods, VOP3Only>,
		VOP3DisableFields<1, useSrc2Input, HasMods>;
		}

multiclass VOP3_C_m <vop op, dag outs, dag ins, string asm,		multiclass VOP3_C_m <vop op, dag outs, dag ins, string asm,
list<dag> pattern, string opName,		list<dag> pattern, string opName,
bit HasMods, bit defExec,		bit HasMods, bit defExec,
string revOp, list<SchedReadWrite> sched> {		string revOp, list<SchedReadWrite> sched> {

def "" : VOP3_Pseudo <outs, ins, pattern, opName, HasMods>,		def "" : VOP3_Pseudo <outs, ins, pattern, opName, HasMods>,
VOP2_REV<revOp#"_e64", !eq(revOp, opName)> {		VOP2_REV<revOp#"_e64", !eq(revOp, opName)> {
let Defs = !if(defExec, [EXEC], []);		let Defs = !if(defExec, [EXEC], []);
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	!if(P.HasModifiers,
[(set P.DstVT:$vdst,		[(set P.DstVT:$vdst,
(node (P.Src0VT (VOP3Mods0 P.Src0VT:$src0, i32:$src0_modifiers,		(node (P.Src0VT (VOP3Mods0 P.Src0VT:$src0, i32:$src0_modifiers,
i1:$clamp, i32:$omod)),		i1:$clamp, i32:$omod)),
(P.Src1VT (VOP3Mods P.Src1VT:$src1, i32:$src1_modifiers))))],		(P.Src1VT (VOP3Mods P.Src1VT:$src1, i32:$src1_modifiers))))],
[(set P.DstVT:$vdst, (node P.Src0VT:$src0, P.Src1VT:$src1))]),		[(set P.DstVT:$vdst, (node P.Src0VT:$src0, P.Src1VT:$src1))]),
opName, revOp, P.HasModifiers>;		opName, revOp, P.HasModifiers>;
}		}

		multiclass VOP2e_Helper <vop2 op, string opName, VOPProfile p,
		list<dag> pat32, list<dag> pat64,
		string revOp, bit useSGPRInput> {

		let SchedRW = [Write32Bit, WriteSALU] in {
		let Uses = !if(useSGPRInput, [VCC, EXEC], [EXEC]), Defs = [VCC] in {
		defm _e32 : VOP2_m <op, opName, p, pat32, revOp>;
		}

		defm _e64 : VOP3e_2_3_m <op, p.Outs64, p.Ins64, opName#p.Asm64, pat64,
		opName, revOp, p.HasModifiers, useSGPRInput>;
		}
		}

		multiclass VOP2eInst <vop2 op, string opName, VOPProfile P,
		SDPatternOperator node = null_frag,
		string revOp = opName> : VOP2e_Helper <
		op, opName, P, [],
		!if(P.HasModifiers,
		[(set P.DstVT:$vdst,
		(node (P.Src0VT (VOP3Mods0 P.Src0VT:$src0, i32:$src0_modifiers,
		i1:$clamp, i32:$omod)),
		(P.Src1VT (VOP3Mods P.Src1VT:$src1, i32:$src1_modifiers))))],
		[(set P.DstVT:$vdst, (node P.Src0VT:$src0, P.Src1VT:$src1))]),
		revOp, !eq(P.NumSrcArgs, 3)
		>;

multiclass VOP2b_Helper <vop2 op, string opName, VOPProfile p,		multiclass VOP2b_Helper <vop2 op, string opName, VOPProfile p,
list<dag> pat32, list<dag> pat64,		list<dag> pat32, list<dag> pat64,
string revOp, bit useSGPRInput> {		string revOp, bit useSGPRInput> {

let SchedRW = [Write32Bit, WriteSALU] in {		let SchedRW = [Write32Bit, WriteSALU] in {
let Uses = !if(useSGPRInput, [VCC, EXEC], [EXEC]), Defs = [VCC] in {		let Uses = !if(useSGPRInput, [VCC, EXEC], [EXEC]), Defs = [VCC] in {
defm _e32 : VOP2_m <op, opName, p, pat32, revOp>;		defm _e32 : VOP2_m <op, opName, p, pat32, revOp>;
}		}
▲ Show 20 Lines • Show All 1,431 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 1,467 Lines • ▼ Show 20 Lines	[(set f32:$dst, (AMDGPUinterp_mov (i32 imm:$src0), (i32 imm:$attr_chan),
(i32 imm:$attr)))]>;		(i32 imm:$attr)))]>;

} // End Uses = [M0, EXEC]		} // End Uses = [M0, EXEC]

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// VOP2 Instructions		// VOP2 Instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

multiclass V_CNDMASK <vop2 op, string name> {		defm V_CNDMASK_B32 : VOP2eInst <vop2<0x0, 0x0>, "v_cndmask_b32",
defm _e32 : VOP2_m <op, name, VOP_CNDMASK, [], name>;		VOP2e_I32_I32_I32_I1
		>;
defm _e64 : VOP3_m <
op, VOP_CNDMASK.Outs, VOP_CNDMASK.Ins64,
name#!cast<string>(VOP_CNDMASK.Asm64), [], name, 3, 0>;
}

defm V_CNDMASK_B32 : V_CNDMASK<vop2<0x0>, "v_cndmask_b32">;

let isCommutable = 1 in {		let isCommutable = 1 in {
defm V_ADD_F32 : VOP2Inst <vop2<0x3, 0x1>, "v_add_f32",		defm V_ADD_F32 : VOP2Inst <vop2<0x3, 0x1>, "v_add_f32",
VOP_F32_F32_F32, fadd		VOP_F32_F32_F32, fadd
>;		>;

defm V_SUB_F32 : VOP2Inst <vop2<0x4, 0x2>, "v_sub_f32", VOP_F32_F32_F32, fsub>;		defm V_SUB_F32 : VOP2Inst <vop2<0x4, 0x2>, "v_sub_f32", VOP_F32_F32_F32, fsub>;
defm V_SUBREV_F32 : VOP2Inst <vop2<0x5, 0x3>, "v_subrev_f32",		defm V_SUBREV_F32 : VOP2Inst <vop2<0x5, 0x3>, "v_subrev_f32",
▲ Show 20 Lines • Show All 2,113 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/fceil64.ll

	Show All 19 Lines
	; SI-DAG: cmp_gt_i32			; SI-DAG: cmp_gt_i32
	; SI-DAG: cndmask_b32			; SI-DAG: cndmask_b32
	; SI-DAG: cndmask_b32			; SI-DAG: cndmask_b32
	; SI-DAG: cmp_lt_i32			; SI-DAG: cmp_lt_i32
	; SI-DAG: cndmask_b32			; SI-DAG: cndmask_b32
	; SI-DAG: cndmask_b32			; SI-DAG: cndmask_b32
	; SI-DAG: v_cmp_lt_f64			; SI-DAG: v_cmp_lt_f64
	; SI-DAG: v_cmp_lg_f64			; SI-DAG: v_cmp_lg_f64
	; SI-DAG: s_and_b64			; SI-DAG: v_cndmask_b32
				arsenmUnsubmitted Not Done Reply Inline Actions Where did the s_and_b64 go? arsenm: Where did the s_and_b64 go?
				artem.tamazovAuthorUnsubmitted Not Done Reply Inline Actions There was superfluous s_and_b64. Output of fceil_f64 contained only one s_and_b64 even prior this change. The issue somehow became apparent after my changes. artem.tamazov: There was superfluous s_and_b64. Output of fceil_f64 contained only one s_and_b64 even prior…
	; SI: v_cndmask_b32
	; SI: v_cndmask_b32			; SI: v_cndmask_b32
	; SI: v_add_f64			; SI: v_add_f64
	; SI: s_endpgm			; SI: s_endpgm
	define void @fceil_f64(double addrspace(1)* %out, double %x) {			define void @fceil_f64(double addrspace(1)* %out, double %x) {
	%y = call double @llvm.ceil.f64(double %x) nounwind readnone			%y = call double @llvm.ceil.f64(double %x) nounwind readnone
	store double %y, double addrspace(1)* %out			store double %y, double addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/sint_to_fp.i64.ll

	Show All 19 Lines
	; GCN: v_ffbh_u32			; GCN: v_ffbh_u32
	; GCN: v_cndmask			; GCN: v_cndmask
	; GCN: v_cndmask			; GCN: v_cndmask

	; GCN-DAG: v_cmp_eq_i64			; GCN-DAG: v_cmp_eq_i64
	; GCN-DAG: v_cmp_lt_u64			; GCN-DAG: v_cmp_lt_u64

	; GCN: v_xor_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}			; GCN: v_xor_b32_e32 v{{[0-9]+}}, 0x80000000, v{{[0-9]+}}
	; GCN: v_cndmask_b32_e32 [[SIGN_SEL:v[0-9]+]],			; GCN: v_cndmask_b32_e{{32\|64}} [[SIGN_SEL:v[0-9]+]],
				arsenmUnsubmitted Not Done Reply Inline Actions This changes looks like it regressed to now use e64 arsenm: This changes looks like it regressed to now use e64
				artem.tamazovAuthorUnsubmitted Not Done Reply Inline Actions Yes, this is a case of regression. However, overall stats for this test looks progressed: Function codeSize nSgps nVprs (prior/after change) ---------------------------------------------- s_sint_to_fp_i64_to_f32 152/148 14/14 4/4 v_sint_to_fp_i64_to_f32 180/188 12/14 7/6 s_sint_to_fp_v2i64 296/284 17/17 8/8 v_sint_to_fp_v4i64 656/624 15/17 21/18 ---------------------------------------------- Total diff -40 +4 -4 For fceil64.ll, stats progressed as well: codeSize nSgps nVprs Function (prior/after change) ----------------------------------------------- fceil_f64 132/128 12/12 4/4 fceil_v2f64 248/240 16/16 7/7 fceil_v4f64 480/464 20/20 11/11 fceil_v8f64 916/876 27/27 21/20 fceil_v16f64 1996/1988 62/62 46/46 ----------------------------------------------- Total diff -76 0 -1 As overall stats looks good, I decided that no further investigation necessary right away. Regarding v_sint_to_fp_i64_to_f32 regression. After looking at ISA changes, I suspect that instruction scheduler prefers to save a VGPR at the cost of code size. I can send you both ISAs if you would like to look at this. artem.tamazov: Yes, this is a case of regression. However, overall stats for this test looks progressed: ```…
	; GCN: {{buffer\|flat}}_store_dword {{.*}}[[SIGN_SEL]]			; GCN: {{buffer\|flat}}_store_dword {{.*}}[[SIGN_SEL]]
	define void @v_sint_to_fp_i64_to_f32(float addrspace(1)* %out, i64 addrspace(1)* %in) #0 {			define void @v_sint_to_fp_i64_to_f32(float addrspace(1)* %out, i64 addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%in.gep = getelementptr i64, i64 addrspace(1)* %in, i32 %tid			%in.gep = getelementptr i64, i64 addrspace(1)* %in, i32 %tid
	%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid			%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
	%val = load i64, i64 addrspace(1)* %in.gep			%val = load i64, i64 addrspace(1)* %in.gep
	%result = sitofp i64 %val to float			%result = sitofp i64 %val to float
	store float %result, float addrspace(1)* %out.gep			store float %result, float addrspace(1)* %out.gep
	Show All 25 Lines

llvm/trunk/test/MC/AMDGPU/vop2-err.s

	// RUN: not llvm-mc -arch=amdgcn %s 2>&1 \| FileCheck %s			// RUN: not llvm-mc -arch=amdgcn %s 2>&1 \| FileCheck %s
	// RUN: not llvm-mc -arch=amdgcn -mcpu=SI %s 2>&1 \| FileCheck %s			// RUN: not llvm-mc -arch=amdgcn -mcpu=SI %s 2>&1 \| FileCheck %s

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Generic checks			// Generic checks
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	v_mul_i32_i24 v1, v2, 100			v_mul_i32_i24 v1, v2, 100
	// CHECK: error: invalid operand for instruction			// CHECK: error: invalid operand for instruction

				v_cndmask_b32 v1, v2, v3
				// CHECK: error: too few operands for instruction

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// _e32 checks			// _e32 checks
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// Immediate src1			// Immediate src1
	v_mul_i32_i24_e32 v1, v2, 100			v_mul_i32_i24_e32 v1, v2, 100
	// CHECK: error: invalid operand for instruction			// CHECK: error: invalid operand for instruction

	// sgpr src1			// sgpr src1
	v_mul_i32_i24_e32 v1, v2, s3			v_mul_i32_i24_e32 v1, v2, s3
	// CHECK: error: invalid operand for instruction			// CHECK: error: invalid operand for instruction

				v_cndmask_b32_e32 v1, v2, v3, s[0:1]
				// CHECK: error: invalid operand for instruction

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// _e64 checks			// _e64 checks
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// Immediate src0			// Immediate src0
	v_mul_i32_i24_e64 v1, 100, v3			v_mul_i32_i24_e64 v1, 100, v3
	// CHECK: error: invalid operand for instruction			// CHECK: error: invalid operand for instruction

	Show All 32 Lines

llvm/trunk/test/MC/AMDGPU/vop2.s

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	// src0 inline src1 sgpr			// src0 inline src1 sgpr
	// SICI: v_mul_i32_i24_e64 v1, 3, s3 ; encoding: [0x01,0x00,0x12,0xd2,0x83,0x06,0x00,0x00]			// SICI: v_mul_i32_i24_e64 v1, 3, s3 ; encoding: [0x01,0x00,0x12,0xd2,0x83,0x06,0x00,0x00]
	v_mul_i32_i24 v1, 3, s3			v_mul_i32_i24 v1, 3, s3

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Instructions			// Instructions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// GCN: v_cndmask_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x00]			// GCN: v_cndmask_b32_e32 v1, v2, v3, vcc ; encoding: [0x02,0x07,0x02,0x00]
	v_cndmask_b32 v1, v2, v3			v_cndmask_b32 v1, v2, v3, vcc

				// GCN: v_cndmask_b32_e32 v1, v2, v3, vcc ; encoding: [0x02,0x07,0x02,0x00]
				v_cndmask_b32_e32 v1, v2, v3, vcc

	// SICI: v_readlane_b32 s1, v2, s3 ; encoding: [0x02,0x07,0x02,0x02]			// SICI: v_readlane_b32 s1, v2, s3 ; encoding: [0x02,0x07,0x02,0x02]
	// VI: v_readlane_b32 s1, v2, s3 ; encoding: [0x01,0x00,0x89,0xd2,0x02,0x07,0x00,0x00]			// VI: v_readlane_b32 s1, v2, s3 ; encoding: [0x01,0x00,0x89,0xd2,0x02,0x07,0x00,0x00]
	v_readlane_b32 s1, v2, s3			v_readlane_b32 s1, v2, s3

	// SICI: v_writelane_b32 v1, s2, s3 ; encoding: [0x02,0x06,0x02,0x04]			// SICI: v_writelane_b32 v1, s2, s3 ; encoding: [0x02,0x06,0x02,0x04]
	// VI: v_writelane_b32 v1, s2, s3 ; encoding: [0x01,0x00,0x8a,0xd2,0x02,0x06,0x00,0x00]			// VI: v_writelane_b32 v1, s2, s3 ; encoding: [0x01,0x00,0x8a,0xd2,0x02,0x06,0x00,0x00]
	v_writelane_b32 v1, s2, s3			v_writelane_b32 v1, s2, s3
	▲ Show 20 Lines • Show All 380 Lines • Show Last 20 Lines

llvm/trunk/test/MC/AMDGPU/vop3.s

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines


	// TODO: Modifier tests			// TODO: Modifier tests

	v_cndmask_b32 v1, v3, v5, s[4:5]			v_cndmask_b32 v1, v3, v5, s[4:5]
	// SICI: v_cndmask_b32_e64 v1, v3, v5, s[4:5] ; encoding: [0x01,0x00,0x00,0xd2,0x03,0x0b,0x12,0x00]			// SICI: v_cndmask_b32_e64 v1, v3, v5, s[4:5] ; encoding: [0x01,0x00,0x00,0xd2,0x03,0x0b,0x12,0x00]
	// VI: v_cndmask_b32_e64 v1, v3, v5, s[4:5] ; encoding: [0x01,0x00,0x00,0xd1,0x03,0x0b,0x12,0x00]			// VI: v_cndmask_b32_e64 v1, v3, v5, s[4:5] ; encoding: [0x01,0x00,0x00,0xd1,0x03,0x0b,0x12,0x00]

				v_cndmask_b32_e64 v1, v3, v5, s[4:5]
				// SICI: v_cndmask_b32_e64 v1, v3, v5, s[4:5] ; encoding: [0x01,0x00,0x00,0xd2,0x03,0x0b,0x12,0x00]
				// VI: v_cndmask_b32_e64 v1, v3, v5, s[4:5] ; encoding: [0x01,0x00,0x00,0xd1,0x03,0x0b,0x12,0x00]

				v_cndmask_b32_e64 v1, v3, v5, vcc
				// SICI: v_cndmask_b32_e64 v1, v3, v5, vcc ; encoding: [0x01,0x00,0x00,0xd2,0x03,0x0b,0xaa,0x01]
				// VI: v_cndmask_b32_e64 v1, v3, v5, vcc ; encoding: [0x01,0x00,0x00,0xd1,0x03,0x0b,0xaa,0x01]

	//TODO: readlane, writelane			//TODO: readlane, writelane

	v_add_f32 v1, v3, s5			v_add_f32 v1, v3, s5
	// SICI: v_add_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x06,0xd2,0x03,0x0b,0x00,0x00]			// SICI: v_add_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x06,0xd2,0x03,0x0b,0x00,0x00]
	// VI: v_add_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x01,0xd1,0x03,0x0b,0x00,0x00]			// VI: v_add_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x01,0xd1,0x03,0x0b,0x00,0x00]

	v_sub_f32 v1, v3, s5			v_sub_f32 v1, v3, s5
	// SICI: v_sub_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x08,0xd2,0x03,0x0b,0x00,0x00]			// SICI: v_sub_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x08,0xd2,0x03,0x0b,0x00,0x00]
	▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

llvm/trunk/test/MC/Disassembler/AMDGPU/vop2_vi.txt

	# RUN: llvm-mc -arch=amdgcn -mcpu=tonga -disassemble -show-encoding < %s \| FileCheck %s -check-prefix=VI			# RUN: llvm-mc -arch=amdgcn -mcpu=tonga -disassemble -show-encoding < %s \| FileCheck %s -check-prefix=VI

				# VI: v_cndmask_b32_e32 v1, v2, v3, vcc ; encoding: [0x02,0x07,0x02,0x00]
				0x02 0x07 0x02 0x00

	# VI: v_readlane_b32 s1, v2, s3 ; encoding: [0x01,0x00,0x89,0xd2,0x02,0x07,0x00,0x00]			# VI: v_readlane_b32 s1, v2, s3 ; encoding: [0x01,0x00,0x89,0xd2,0x02,0x07,0x00,0x00]
	0x01 0x00 0x89 0xd2 0x02 0x07 0x00 0x00			0x01 0x00 0x89 0xd2 0x02 0x07 0x00 0x00

	# VI: v_writelane_b32 v1, s2, s3 ; encoding: [0x01,0x00,0x8a,0xd2,0x02,0x06,0x00,0x00]			# VI: v_writelane_b32 v1, s2, s3 ; encoding: [0x01,0x00,0x8a,0xd2,0x02,0x06,0x00,0x00]
	0x01 0x00 0x8a 0xd2 0x02 0x06 0x00 0x00			0x01 0x00 0x8a 0xd2 0x02 0x06 0x00 0x00

	# VI: v_add_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x02]			# VI: v_add_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x02]
	0x02 0x07 0x02 0x02			0x02 0x07 0x02 0x02
	▲ Show 20 Lines • Show All 243 Lines • Show Last 20 Lines

llvm/trunk/test/MC/Disassembler/AMDGPU/vop3_vi.txt

	Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	0x01 0x80 0x5b 0xd1 0x02 0x01 0x00 0x18			0x01 0x80 0x5b 0xd1 0x02 0x01 0x00 0x18

	# VI: v_add_f32_e64 v1, v3, v5 ; encoding: [0x01,0x00,0x01,0xd1,0x03,0x0b,0x02,0x00]			# VI: v_add_f32_e64 v1, v3, v5 ; encoding: [0x01,0x00,0x01,0xd1,0x03,0x0b,0x02,0x00]
	0x01 0x00 0x01 0xd1 0x03 0x0b 0x02 0x00			0x01 0x00 0x01 0xd1 0x03 0x0b 0x02 0x00

	# VI: v_cndmask_b32_e64 v1, v3, v5, s[4:5] ; encoding: [0x01,0x00,0x00,0xd1,0x03,0x0b,0x12,0x00]			# VI: v_cndmask_b32_e64 v1, v3, v5, s[4:5] ; encoding: [0x01,0x00,0x00,0xd1,0x03,0x0b,0x12,0x00]
	0x01 0x00 0x00 0xd1 0x03 0x0b 0x12 0x00			0x01 0x00 0x00 0xd1 0x03 0x0b 0x12 0x00

				# VI: v_cndmask_b32_e64 v1, v3, v5, vcc ; encoding: [0x01,0x00,0x00,0xd1,0x03,0x0b,0xaa,0x01]
				0x01 0x00 0x00 0xd1 0x03 0x0b 0xaa 0x01

	# VI: v_add_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x01,0xd1,0x03,0x0b,0x00,0x00]			# VI: v_add_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x01,0xd1,0x03,0x0b,0x00,0x00]
	0x01 0x00 0x01 0xd1 0x03 0x0b 0x00 0x00			0x01 0x00 0x01 0xd1 0x03 0x0b 0x00 0x00

	# VI: v_sub_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x02,0xd1,0x03,0x0b,0x00,0x00]			# VI: v_sub_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x02,0xd1,0x03,0x0b,0x00,0x00]
	0x01 0x00 0x02 0xd1 0x03 0x0b 0x00 0x00			0x01 0x00 0x02 0xd1 0x03 0x0b 0x00 0x00

	# VI: v_subrev_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x03,0xd1,0x03,0x0b,0x00,0x00]			# VI: v_subrev_f32_e64 v1, v3, s5 ; encoding: [0x01,0x00,0x03,0xd1,0x03,0x0b,0x00,0x00]
	0x01 0x00 0x03 0xd1 0x03 0x0b 0x00 0x00			0x01 0x00 0x03 0xd1 0x03 0x0b 0x00 0x00
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 59727

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.td

llvm/trunk/lib/Target/AMDGPU/SIInstructions.td

llvm/trunk/test/CodeGen/AMDGPU/fceil64.ll

llvm/trunk/test/CodeGen/AMDGPU/sint_to_fp.i64.ll

llvm/trunk/test/MC/AMDGPU/vop2-err.s

llvm/trunk/test/MC/AMDGPU/vop2.s

llvm/trunk/test/MC/AMDGPU/vop3.s

llvm/trunk/test/MC/Disassembler/AMDGPU/vop2_vi.txt

llvm/trunk/test/MC/Disassembler/AMDGPU/vop3_vi.txt

[AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC.
ClosedPublic