This is an archive of the discontinued LLVM Phabricator instance.

lib/Target/PowerPC/PPCInstrAltivec.td
931	Hi Nemanja, I have one concern on whether these two hardware instructions for vector float point can be perfectly mapped to these two ISDNode. As the description of fmaxnum/fminnum "in the case where a single input is NaN, the non-NaN input is returned.", while the description for the vmaxfp/vminfp in ISA like "The maximum of +0 and -0 is +0. The maximum of any value and a NaN is a QNaN." It looks more suitable for the fmaxnan/fminnan?
lib/Target/PowerPC/PPCInstrVSX.td
1537	I'm not sure why not use similar patterns like VMAXSW in PPCInstrAltivec.td but being located in HasP8Vector scope? Is there some special reasons with COPY_TO_REGCLASS?
test/CodeGen/PowerPC/vec-min-max.ll
54	Some more cases cover sge/sle seems trivial but the coverage is better?

jedilyn added a reviewer: jedilyn.Jul 18 2018, 6:13 AM

@jedilyn Hi Ke Wen, thanks for your comments. This needs some cleanup with regard to which instructions match the semantics of F{MIN|MAX}NUM vs. F{MIN|MAX}NAN. I'll clean that up and re-post this for your review. Thanks.

lib/Target/PowerPC/PPCInstrAltivec.td
931	I am really sorry about such a long delay in responding to this. You are absolutely right. The correct nodes are `FMINNAN` and `FMAXNAN`. I think what I meant to use here are `XVMAXDP, XVMINDP, XVMAXSP, XVMINSP`. Those have the mentioned semantics (i.e. comparing a value and a NaN returns the value). And I don't think we need to worry about signalling NaNs at this time.
lib/Target/PowerPC/PPCInstrVSX.td
1537	These instructions are new in ISA 2.07 so they have to be in the P8Vector block. Also, the `COPY_TO_REGCLASS` is needed because `VRRC` registers cannot contain `v2i64` values.

jedilyn mentioned this in D54783: [PowerPC] suboptimal vec_abs for some cases on P9.Dec 14 2018, 12:35 AM

RKSimon added a subscriber: RKSimon.Dec 14 2018, 12:36 AM

Herald added a subscriber: jsji. · View Herald TranscriptDec 14 2018, 12:36 AM

This covers PR39130 right?

Maybe worth adding the new vec-min-max.ll test file to trunk with current codegen so this patch shows the improved codegen diff?

test/CodeGen/PowerPC/vec-min-max.ll
6	You might be better using common prefixes to share (and reduce) checks. --check-prefixes=CHECK,P8VEC --check-prefixes=CHECK,NOP8VEC

In D47332#1330931, @RKSimon wrote:

This covers PR39130 right?

Maybe worth adding the new vec-min-max.ll test file to trunk with current codegen so this patch shows the improved codegen diff?

Yes, this will fix the PR for vector types. Scalar types will come later.

test/CodeGen/PowerPC/vec-min-max.ll
6	Oh cool. I was not aware of this functionality. Thank you.

Updated to remove the patterns for the Altivec versions of vector min/max as they have IEEE semantics wrt. handling NaN. A subsequent patch will legalize the _IEEE versions of the nodes for single precision and provide patterns to match them to vmaxfp/vminfp.

Is anything happening with this? We've hit PPC issues with ISD::ABS on https://reviews.llvm.org/D49837 and I noticed that PPCTargetLowering::LowerABS has a reference to this patch.

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2019, 4:22 AM

Herald added a subscriber: jdoerfert. · View Herald Transcript

In D47332#1424366, @RKSimon wrote:

Is anything happening with this? We've hit PPC issues with ISD::ABS on https://reviews.llvm.org/D49837 and I noticed that PPCTargetLowering::LowerABS has a reference to this patch.

Thanks for pointing this out - I forgot about this patch. It should be ready to commit, perhaps @hfinkel, @echristo or @jsji can go through this and give their opinion on whether we can go ahead with this.

LGTM. Thanks for your time!

This revision is now accepted and ready to land.Mar 12 2019, 8:44 PM

Closed by commit rL362759: [PowerPC] Exploit the vector min/max instructions (authored by nemanjai). · Explain WhyJun 6 2019, 4:47 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelLowering.cpp

18 lines

PPCInstrAltivec.td

26 lines

PPCInstrVSX.td

21 lines

test/

CodeGen/

PowerPC/

ctr-minmaxnum.ll

21 lines

vec-min-max.ll

239 lines

Diff 179727

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 546 Lines • ▼ Show 20 Lines	if (Subtarget.hasAltivec()) {
// First set operation action for all vector types to expand. Then we		// First set operation action for all vector types to expand. Then we
// will selectively turn on ones that can be effectively codegen'd.		// will selectively turn on ones that can be effectively codegen'd.
for (MVT VT : MVT::vector_valuetypes()) {		for (MVT VT : MVT::vector_valuetypes()) {
// add/sub are legal for all supported vector VT's.		// add/sub are legal for all supported vector VT's.
setOperationAction(ISD::ADD, VT, Legal);		setOperationAction(ISD::ADD, VT, Legal);
setOperationAction(ISD::SUB, VT, Legal);		setOperationAction(ISD::SUB, VT, Legal);
setOperationAction(ISD::ABS, VT, Custom);		setOperationAction(ISD::ABS, VT, Custom);

		// For v2i64, these are only valid with P8Vector. This is corrected after
		// the loop.
		setOperationAction(ISD::SMAX, VT, Legal);
		setOperationAction(ISD::SMIN, VT, Legal);
		setOperationAction(ISD::UMAX, VT, Legal);
		setOperationAction(ISD::UMIN, VT, Legal);

		if (Subtarget.hasVSX()) {
		setOperationAction(ISD::FMAXNUM, VT, Legal);
		setOperationAction(ISD::FMINNUM, VT, Legal);
		}

// Vector instructions introduced in P8		// Vector instructions introduced in P8
if (Subtarget.hasP8Altivec() && (VT.SimpleTy != MVT::v1i128)) {		if (Subtarget.hasP8Altivec() && (VT.SimpleTy != MVT::v1i128)) {
setOperationAction(ISD::CTPOP, VT, Legal);		setOperationAction(ISD::CTPOP, VT, Legal);
setOperationAction(ISD::CTLZ, VT, Legal);		setOperationAction(ISD::CTLZ, VT, Legal);
}		}
else {		else {
setOperationAction(ISD::CTPOP, VT, Expand);		setOperationAction(ISD::CTPOP, VT, Expand);
setOperationAction(ISD::CTLZ, VT, Expand);		setOperationAction(ISD::CTLZ, VT, Expand);
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	for (MVT VT : MVT::vector_valuetypes()) {

for (MVT InnerVT : MVT::vector_valuetypes()) {		for (MVT InnerVT : MVT::vector_valuetypes()) {
setTruncStoreAction(VT, InnerVT, Expand);		setTruncStoreAction(VT, InnerVT, Expand);
setLoadExtAction(ISD::SEXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::SEXTLOAD, VT, InnerVT, Expand);
setLoadExtAction(ISD::ZEXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::ZEXTLOAD, VT, InnerVT, Expand);
setLoadExtAction(ISD::EXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::EXTLOAD, VT, InnerVT, Expand);
}		}
}		}
		if (!Subtarget.hasP8Vector()) {
		setOperationAction(ISD::SMAX, MVT::v2i64, Expand);
		setOperationAction(ISD::SMIN, MVT::v2i64, Expand);
		setOperationAction(ISD::UMAX, MVT::v2i64, Expand);
		setOperationAction(ISD::UMIN, MVT::v2i64, Expand);
		}

// We can custom expand all VECTOR_SHUFFLEs to VPERM, others we can handle		// We can custom expand all VECTOR_SHUFFLEs to VPERM, others we can handle
// with merges, splats, etc.		// with merges, splats, etc.
setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v16i8, Custom);		setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v16i8, Custom);

setOperationAction(ISD::AND , MVT::v4i32, Legal);		setOperationAction(ISD::AND , MVT::v4i32, Legal);
setOperationAction(ISD::OR , MVT::v4i32, Legal);		setOperationAction(ISD::OR , MVT::v4i32, Legal);
setOperationAction(ISD::XOR , MVT::v4i32, Legal);		setOperationAction(ISD::XOR , MVT::v4i32, Legal);
▲ Show 20 Lines • Show All 14,103 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrAltivec.td

	Show First 20 Lines • Show All 893 Lines • ▼ Show 20 Lines
	def : Pat<(v2i64 (bitconvert (v1i128 VRRC:$src))), (v2i64 VRRC:$src)>;			def : Pat<(v2i64 (bitconvert (v1i128 VRRC:$src))), (v2i64 VRRC:$src)>;

	def : Pat<(v1i128 (bitconvert (v16i8 VRRC:$src))), (v1i128 VRRC:$src)>;			def : Pat<(v1i128 (bitconvert (v16i8 VRRC:$src))), (v1i128 VRRC:$src)>;
	def : Pat<(v1i128 (bitconvert (v8i16 VRRC:$src))), (v1i128 VRRC:$src)>;			def : Pat<(v1i128 (bitconvert (v8i16 VRRC:$src))), (v1i128 VRRC:$src)>;
	def : Pat<(v1i128 (bitconvert (v4i32 VRRC:$src))), (v1i128 VRRC:$src)>;			def : Pat<(v1i128 (bitconvert (v4i32 VRRC:$src))), (v1i128 VRRC:$src)>;
	def : Pat<(v1i128 (bitconvert (v4f32 VRRC:$src))), (v1i128 VRRC:$src)>;			def : Pat<(v1i128 (bitconvert (v4f32 VRRC:$src))), (v1i128 VRRC:$src)>;
	def : Pat<(v1i128 (bitconvert (v2i64 VRRC:$src))), (v1i128 VRRC:$src)>;			def : Pat<(v1i128 (bitconvert (v2i64 VRRC:$src))), (v1i128 VRRC:$src)>;

				// Max/Min
				def : Pat<(v16i8 (umax v16i8:$src1, v16i8:$src2)),
				(v16i8 (VMAXUB $src1, $src2))>;
				def : Pat<(v16i8 (smax v16i8:$src1, v16i8:$src2)),
				(v16i8 (VMAXSB $src1, $src2))>;
				def : Pat<(v8i16 (umax v8i16:$src1, v8i16:$src2)),
				(v8i16 (VMAXUH $src1, $src2))>;
				def : Pat<(v8i16 (smax v8i16:$src1, v8i16:$src2)),
				(v8i16 (VMAXSH $src1, $src2))>;
				def : Pat<(v4i32 (umax v4i32:$src1, v4i32:$src2)),
				(v4i32 (VMAXUW $src1, $src2))>;
				def : Pat<(v4i32 (smax v4i32:$src1, v4i32:$src2)),
				(v4i32 (VMAXSW $src1, $src2))>;
				def : Pat<(v16i8 (umin v16i8:$src1, v16i8:$src2)),
				(v16i8 (VMINUB $src1, $src2))>;
				def : Pat<(v16i8 (smin v16i8:$src1, v16i8:$src2)),
				(v16i8 (VMINSB $src1, $src2))>;
				def : Pat<(v8i16 (umin v8i16:$src1, v8i16:$src2)),
				(v8i16 (VMINUH $src1, $src2))>;
				def : Pat<(v8i16 (smin v8i16:$src1, v8i16:$src2)),
				(v8i16 (VMINSH $src1, $src2))>;
				def : Pat<(v4i32 (umin v4i32:$src1, v4i32:$src2)),
				(v4i32 (VMINUW $src1, $src2))>;
				def : Pat<(v4i32 (smin v4i32:$src1, v4i32:$src2)),
				(v4i32 (VMINSW $src1, $src2))>;

	// Shuffles.			// Shuffles.

	// Match vsldoi(x,x), vpkuwum(x,x), vpkuhum(x,x)			// Match vsldoi(x,x), vpkuwum(x,x), vpkuhum(x,x)
	def:Pat<(vsldoi_unary_shuffle:$in v16i8:$vA, undef),			def:Pat<(vsldoi_unary_shuffle:$in v16i8:$vA, undef),
				jedilynUnsubmitted Not Done Reply Inline Actions Hi Nemanja, I have one concern on whether these two hardware instructions for vector float point can be perfectly mapped to these two ISDNode. As the description of fmaxnum/fminnum "in the case where a single input is NaN, the non-NaN input is returned.", while the description for the vmaxfp/vminfp in ISA like "The maximum of +0 and -0 is +0. The maximum of any value and a NaN is a QNaN." It looks more suitable for the fmaxnan/fminnan? jedilyn: Hi Nemanja, I have one concern on whether these two hardware instructions for vector float…
				nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions I am really sorry about such a long delay in responding to this. You are absolutely right. The correct nodes are `FMINNAN` and `FMAXNAN`. I think what I meant to use here are `XVMAXDP, XVMINDP, XVMAXSP, XVMINSP`. Those have the mentioned semantics (i.e. comparing a value and a NaN returns the value). And I don't think we need to worry about signalling NaNs at this time. nemanjai: I am really sorry about such a long delay in responding to this. You are absolutely right. The…
	(VSLDOI $vA, $vA, (VSLDOI_unary_get_imm $in))>;			(VSLDOI $vA, $vA, (VSLDOI_unary_get_imm $in))>;
	def:Pat<(vpkuwum_unary_shuffle v16i8:$vA, undef),			def:Pat<(vpkuwum_unary_shuffle v16i8:$vA, undef),
	(VPKUWUM $vA, $vA)>;			(VPKUWUM $vA, $vA)>;
	def:Pat<(vpkuhum_unary_shuffle v16i8:$vA, undef),			def:Pat<(vpkuhum_unary_shuffle v16i8:$vA, undef),
	(VPKUHUM $vA, $vA)>;			(VPKUHUM $vA, $vA)>;
	def:Pat<(vsldoi_shuffle:$SH v16i8:$vA, v16i8:$vB),			def:Pat<(vsldoi_shuffle:$SH v16i8:$vA, v16i8:$vB),
	(VSLDOI v16i8:$vA, v16i8:$vB, (VSLDOI_get_imm $SH))>;			(VSLDOI v16i8:$vA, v16i8:$vB, (VSLDOI_get_imm $SH))>;

	▲ Show 20 Lines • Show All 608 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 1,170 Lines • ▼ Show 20 Lines	def : Pat<(vselect v4i32:$vA, v4i32:$vB, v4i32:$vC),
(XXSEL $vC, $vB, $vA)>;		(XXSEL $vC, $vB, $vA)>;
def : Pat<(vselect v2i64:$vA, v2i64:$vB, v2i64:$vC),		def : Pat<(vselect v2i64:$vA, v2i64:$vB, v2i64:$vC),
(XXSEL $vC, $vB, $vA)>;		(XXSEL $vC, $vB, $vA)>;
def : Pat<(vselect v4i32:$vA, v4f32:$vB, v4f32:$vC),		def : Pat<(vselect v4i32:$vA, v4f32:$vB, v4f32:$vC),
(XXSEL $vC, $vB, $vA)>;		(XXSEL $vC, $vB, $vA)>;
def : Pat<(vselect v2i64:$vA, v2f64:$vB, v2f64:$vC),		def : Pat<(vselect v2i64:$vA, v2f64:$vB, v2f64:$vC),
(XXSEL $vC, $vB, $vA)>;		(XXSEL $vC, $vB, $vA)>;

		def : Pat<(v4f32 (fmaxnum v4f32:$src1, v4f32:$src2)),
		(v4f32 (XVMAXSP $src1, $src2))>;
		def : Pat<(v4f32 (fminnum v4f32:$src1, v4f32:$src2)),
		(v4f32 (XVMINSP $src1, $src2))>;
		def : Pat<(v2f64 (fmaxnum v2f64:$src1, v2f64:$src2)),
		(v2f64 (XVMAXDP $src1, $src2))>;
		def : Pat<(v2f64 (fminnum v2f64:$src1, v2f64:$src2)),
		(v2f64 (XVMINDP $src1, $src2))>;

let Predicates = [IsLittleEndian] in {		let Predicates = [IsLittleEndian] in {
def : Pat<(f64 (PPCfcfid (PPCmtvsra (i64 (vector_extract v2i64:$S, 0))))),		def : Pat<(f64 (PPCfcfid (PPCmtvsra (i64 (vector_extract v2i64:$S, 0))))),
(f64 (XSCVSXDDP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;		(f64 (XSCVSXDDP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;
def : Pat<(f64 (PPCfcfid (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),		def : Pat<(f64 (PPCfcfid (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),
(f64 (XSCVSXDDP (COPY_TO_REGCLASS (f64 (COPY_TO_REGCLASS $S, VSRC)), VSFRC)))>;		(f64 (XSCVSXDDP (COPY_TO_REGCLASS (f64 (COPY_TO_REGCLASS $S, VSRC)), VSFRC)))>;
def : Pat<(f64 (PPCfcfidu (PPCmtvsra (i64 (vector_extract v2i64:$S, 0))))),		def : Pat<(f64 (PPCfcfidu (PPCmtvsra (i64 (vector_extract v2i64:$S, 0))))),
(f64 (XSCVUXDDP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;		(f64 (XSCVUXDDP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;
def : Pat<(f64 (PPCfcfidu (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),		def : Pat<(f64 (PPCfcfidu (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),
▲ Show 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
// Instructions for converting float to i32 feeding a store.		// Instructions for converting float to i32 feeding a store.
def : Pat<(PPCstore_scal_int_from_vsr		def : Pat<(PPCstore_scal_int_from_vsr
(f64 (PPCcv_fp_to_sint_in_vsr f64:$src)), xoaddr:$dst, 4),		(f64 (PPCcv_fp_to_sint_in_vsr f64:$src)), xoaddr:$dst, 4),
(STIWX (XSCVDPSXWS f64:$src), xoaddr:$dst)>;		(STIWX (XSCVDPSXWS f64:$src), xoaddr:$dst)>;
def : Pat<(PPCstore_scal_int_from_vsr		def : Pat<(PPCstore_scal_int_from_vsr
(f64 (PPCcv_fp_to_uint_in_vsr f64:$src)), xoaddr:$dst, 4),		(f64 (PPCcv_fp_to_uint_in_vsr f64:$src)), xoaddr:$dst, 4),
(STIWX (XSCVDPUXWS f64:$src), xoaddr:$dst)>;		(STIWX (XSCVDPUXWS f64:$src), xoaddr:$dst)>;

		def : Pat<(v2i64 (smax v2i64:$src1, v2i64:$src2)),
		(v2i64 (VMAXSD (COPY_TO_REGCLASS $src1, VRRC),
		(COPY_TO_REGCLASS $src2, VRRC)))>;
		def : Pat<(v2i64 (umax v2i64:$src1, v2i64:$src2)),
		(v2i64 (VMAXUD (COPY_TO_REGCLASS $src1, VRRC),
		(COPY_TO_REGCLASS $src2, VRRC)))>;
		def : Pat<(v2i64 (smin v2i64:$src1, v2i64:$src2)),
		(v2i64 (VMINSD (COPY_TO_REGCLASS $src1, VRRC),
		(COPY_TO_REGCLASS $src2, VRRC)))>;
		def : Pat<(v2i64 (umin v2i64:$src1, v2i64:$src2)),
		(v2i64 (VMINUD (COPY_TO_REGCLASS $src1, VRRC),
		(COPY_TO_REGCLASS $src2, VRRC)))>;
		jedilynUnsubmitted Not Done Reply Inline Actions I'm not sure why not use similar patterns like VMAXSW in PPCInstrAltivec.td but being located in HasP8Vector scope? Is there some special reasons with COPY_TO_REGCLASS? jedilyn: I'm not sure why not use similar patterns like VMAXSW in PPCInstrAltivec.td but being located…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions These instructions are new in ISA 2.07 so they have to be in the P8Vector block. Also, the `COPY_TO_REGCLASS` is needed because `VRRC` registers cannot contain `v2i64` values. nemanjai: These instructions are new in ISA 2.07 so they have to be in the P8Vector block. Also, the…
} // AddedComplexity = 400		} // AddedComplexity = 400
} // HasP8Vector		} // HasP8Vector

let UseVSXReg = 1, AddedComplexity = 400 in {		let UseVSXReg = 1, AddedComplexity = 400 in {
let Predicates = [HasDirectMove] in {		let Predicates = [HasDirectMove] in {
// VSX direct move instructions		// VSX direct move instructions
def MFVSRD : XX1_RS6_RD5_XO<31, 51, (outs g8rc:$rA), (ins vsfrc:$XT),		def MFVSRD : XX1_RS6_RD5_XO<31, 51, (outs g8rc:$rA), (ins vsfrc:$XT),
"mfvsrd $rA, $XT", IIC_VecGeneral,		"mfvsrd $rA, $XT", IIC_VecGeneral,
▲ Show 20 Lines • Show All 2,549 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/ctr-minmaxnum.ll

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	loop_body:
%2 = icmp eq i64 %1, 4		%2 = icmp eq i64 %1, 4
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test1v:		; CHECK-LABEL: test1v:
; CHECK: bl fminf		; CHECK: xvminsp
; CHECK-NOT: mtctr		; CHECK-NOT: bl fminf
; CHECK: bl fminf		; CHECK: mtctr
; CHECK-NOT: mtctr
; CHECK: bl fminf
; CHECK-NOT: mtctr
; CHECK: bl fminf
; CHECK-NOT: bl fminf		; CHECK-NOT: bl fminf
; CHECK: blr		; CHECK: blr

; QPX-LABEL: test1v:		; QPX-LABEL: test1v:
; QPX: mtctr		; QPX: mtctr
; QPX-NOT: bl fminf		; QPX-NOT: bl fminf
; QPX: blr		; QPX: blr

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	loop_body:
%2 = icmp eq i64 %1, 4		%2 = icmp eq i64 %1, 4
br i1 %2, label %loop_exit, label %loop_body		br i1 %2, label %loop_exit, label %loop_body

loop_exit:		loop_exit:
ret void		ret void
}		}

; CHECK-LABEL: test2v:		; CHECK-LABEL: test2v:
; CHECK: bl fmax		; CHECK: xvmaxdp
; CHECK-NOT: mtctr		; CHECK: xvmaxdp
; CHECK: bl fmax		; CHECK-NOT: bl fmax
; CHECK-NOT: mtctr		; CHECK: mtctr
; CHECK: bl fmax
; CHECK-NOT: mtctr
; CHECK: bl fmax
; CHECK-NOT: bl fmax		; CHECK-NOT: bl fmax
; CHECK: blr		; CHECK: blr

; QPX-LABEL: test2v:		; QPX-LABEL: test2v:
; QPX: mtctr		; QPX: mtctr
; QPX-NOT: bl fmax		; QPX-NOT: bl fmax
; QPX: blr		; QPX: blr

▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/vec-min-max.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr8 \
				; RUN: -verify-machineinstrs \| FileCheck %s
				; RUN: llc < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr7 \
				; RUN: -verify-machineinstrs \| FileCheck %s --check-prefix=NOP8VEC
				define <16 x i8> @getsmaxi8(<16 x i8> %a, <16 x i8> %b) {
				RKSimonUnsubmitted Not Done Reply Inline Actions You might be better using common prefixes to share (and reduce) checks. --check-prefixes=CHECK,P8VEC --check-prefixes=CHECK,NOP8VEC RKSimon: You might be better using common prefixes to share (and reduce) checks. --check-prefixes=CHECK…
				nemanjaiAuthorUnsubmitted Done Reply Inline Actions Oh cool. I was not aware of this functionality. Thank you. nemanjai: Oh cool. I was not aware of this functionality. Thank you.
				; CHECK-LABEL: getsmaxi8:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vmaxsb 2, 2, 3
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsmaxi8:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: vmaxsb 2, 2, 3
				; NOP8VEC-NEXT: blr
				entry:
				%0 = icmp sgt <16 x i8> %a, %b
				%1 = select <16 x i1> %0, <16 x i8> %a, <16 x i8> %b
				ret <16 x i8> %1
				}

				define <8 x i16> @getsmaxi16(<8 x i16> %a, <8 x i16> %b) {
				; CHECK-LABEL: getsmaxi16:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vmaxsh 2, 2, 3
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsmaxi16:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: vmaxsh 2, 2, 3
				; NOP8VEC-NEXT: blr
				entry:
				%0 = icmp sgt <8 x i16> %a, %b
				%1 = select <8 x i1> %0, <8 x i16> %a, <8 x i16> %b
				ret <8 x i16> %1
				}

				define <4 x i32> @getsmaxi32(<4 x i32> %a, <4 x i32> %b) {
				; CHECK-LABEL: getsmaxi32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vmaxsw 2, 2, 3
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsmaxi32:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: vmaxsw 2, 2, 3
				; NOP8VEC-NEXT: blr
				entry:
				%0 = icmp sgt <4 x i32> %a, %b
				%1 = select <4 x i1> %0, <4 x i32> %a, <4 x i32> %b
				ret <4 x i32> %1
				}

				define <2 x i64> @getsmaxi64(<2 x i64> %a, <2 x i64> %b) {
				jedilynUnsubmitted Not Done Reply Inline Actions Some more cases cover sge/sle seems trivial but the coverage is better? jedilyn: Some more cases cover sge/sle seems trivial but the coverage is better?
				; CHECK-LABEL: getsmaxi64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vmaxsd 2, 2, 3
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsmaxi64:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: xxswapd 0, 35
				; NOP8VEC-NEXT: addi 3, 1, -32
				; NOP8VEC-NEXT: addi 4, 1, -48
				; NOP8VEC-NEXT: xxswapd 1, 34
				; NOP8VEC-NEXT: stxvd2x 0, 0, 3
				; NOP8VEC-NEXT: stxvd2x 1, 0, 4
				; NOP8VEC-NEXT: ld 3, -24(1)
				; NOP8VEC-NEXT: ld 4, -40(1)
				; NOP8VEC-NEXT: cmpd 4, 3
				; NOP8VEC-NEXT: li 3, 0
				; NOP8VEC-NEXT: li 4, -1
				; NOP8VEC-NEXT: isel 5, 4, 3, 1
				; NOP8VEC-NEXT: std 5, -8(1)
				; NOP8VEC-NEXT: ld 5, -32(1)
				; NOP8VEC-NEXT: ld 6, -48(1)
				; NOP8VEC-NEXT: cmpd 6, 5
				; NOP8VEC-NEXT: isel 3, 4, 3, 1
				; NOP8VEC-NEXT: std 3, -16(1)
				; NOP8VEC-NEXT: addi 3, 1, -16
				; NOP8VEC-NEXT: lxvd2x 0, 0, 3
				; NOP8VEC-NEXT: xxswapd 36, 0
				; NOP8VEC-NEXT: xxsel 34, 35, 34, 36
				; NOP8VEC-NEXT: blr
				entry:
				%0 = icmp sgt <2 x i64> %a, %b
				%1 = select <2 x i1> %0, <2 x i64> %a, <2 x i64> %b
				ret <2 x i64> %1
				}

				define <4 x float> @getsmaxf32(<4 x float> %a, <4 x float> %b) {
				; CHECK-LABEL: getsmaxf32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xvmaxsp 34, 34, 35
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsmaxf32:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: xvmaxsp 34, 34, 35
				; NOP8VEC-NEXT: blr
				entry:
				%0 = fcmp fast oge <4 x float> %a, %b
				%1 = select <4 x i1> %0, <4 x float> %a, <4 x float> %b
				ret <4 x float> %1
				}

				define <2 x double> @getsmaxf64(<2 x double> %a, <2 x double> %b) {
				; CHECK-LABEL: getsmaxf64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xvmaxdp 34, 34, 35
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsmaxf64:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: xvmaxdp 34, 34, 35
				; NOP8VEC-NEXT: blr
				entry:
				%0 = fcmp fast oge <2 x double> %a, %b
				%1 = select <2 x i1> %0, <2 x double> %a, <2 x double> %b
				ret <2 x double> %1
				}

				define <16 x i8> @getsmini8(<16 x i8> %a, <16 x i8> %b) {
				; CHECK-LABEL: getsmini8:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vminsb 2, 2, 3
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsmini8:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: vminsb 2, 2, 3
				; NOP8VEC-NEXT: blr
				entry:
				%0 = icmp slt <16 x i8> %a, %b
				%1 = select <16 x i1> %0, <16 x i8> %a, <16 x i8> %b
				ret <16 x i8> %1
				}

				define <8 x i16> @getsmini16(<8 x i16> %a, <8 x i16> %b) {
				; CHECK-LABEL: getsmini16:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vminsh 2, 2, 3
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsmini16:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: vminsh 2, 2, 3
				; NOP8VEC-NEXT: blr
				entry:
				%0 = icmp slt <8 x i16> %a, %b
				%1 = select <8 x i1> %0, <8 x i16> %a, <8 x i16> %b
				ret <8 x i16> %1
				}

				define <4 x i32> @getsmini32(<4 x i32> %a, <4 x i32> %b) {
				; CHECK-LABEL: getsmini32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vminsw 2, 2, 3
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsmini32:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: vminsw 2, 2, 3
				; NOP8VEC-NEXT: blr
				entry:
				%0 = icmp slt <4 x i32> %a, %b
				%1 = select <4 x i1> %0, <4 x i32> %a, <4 x i32> %b
				ret <4 x i32> %1
				}

				define <2 x i64> @getsmini64(<2 x i64> %a, <2 x i64> %b) {
				; CHECK-LABEL: getsmini64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vminsd 2, 2, 3
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsmini64:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: xxswapd 0, 35
				; NOP8VEC-NEXT: addi 3, 1, -32
				; NOP8VEC-NEXT: addi 4, 1, -48
				; NOP8VEC-NEXT: xxswapd 1, 34
				; NOP8VEC-NEXT: stxvd2x 0, 0, 3
				; NOP8VEC-NEXT: stxvd2x 1, 0, 4
				; NOP8VEC-NEXT: ld 3, -24(1)
				; NOP8VEC-NEXT: ld 4, -40(1)
				; NOP8VEC-NEXT: cmpd 4, 3
				; NOP8VEC-NEXT: li 3, 0
				; NOP8VEC-NEXT: li 4, -1
				; NOP8VEC-NEXT: isel 5, 4, 3, 0
				; NOP8VEC-NEXT: std 5, -8(1)
				; NOP8VEC-NEXT: ld 5, -32(1)
				; NOP8VEC-NEXT: ld 6, -48(1)
				; NOP8VEC-NEXT: cmpd 6, 5
				; NOP8VEC-NEXT: isel 3, 4, 3, 0
				; NOP8VEC-NEXT: std 3, -16(1)
				; NOP8VEC-NEXT: addi 3, 1, -16
				; NOP8VEC-NEXT: lxvd2x 0, 0, 3
				; NOP8VEC-NEXT: xxswapd 36, 0
				; NOP8VEC-NEXT: xxsel 34, 35, 34, 36
				; NOP8VEC-NEXT: blr
				entry:
				%0 = icmp slt <2 x i64> %a, %b
				%1 = select <2 x i1> %0, <2 x i64> %a, <2 x i64> %b
				ret <2 x i64> %1
				}

				define <4 x float> @getsminf32(<4 x float> %a, <4 x float> %b) {
				; CHECK-LABEL: getsminf32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xvminsp 34, 34, 35
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsminf32:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: xvminsp 34, 34, 35
				; NOP8VEC-NEXT: blr
				entry:
				%0 = fcmp fast ole <4 x float> %a, %b
				%1 = select <4 x i1> %0, <4 x float> %a, <4 x float> %b
				ret <4 x float> %1
				}

				define <2 x double> @getsminf64(<2 x double> %a, <2 x double> %b) {
				; CHECK-LABEL: getsminf64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: xvmindp 34, 34, 35
				; CHECK-NEXT: blr
				;
				; NOP8VEC-LABEL: getsminf64:
				; NOP8VEC: # %bb.0: # %entry
				; NOP8VEC-NEXT: xvmindp 34, 34, 35
				; NOP8VEC-NEXT: blr
				entry:
				%0 = fcmp fast ole <2 x double> %a, %b
				%1 = select <2 x i1> %0, <2 x double> %a, <2 x double> %b
				ret <2 x double> %1
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Exploit the vector min/max instructionsClosedPublic

Details

Diff Detail

Event Timeline