This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Custom legalize splat_vector and disable unprofitable generic DAG combine
Changes PlannedPublic

Authored by reames on Sep 16 2022, 12:03 PM.

Download Raw Diff

Details

Reviewers

craig.topper
paulwalker-arm
cameron.mcinally

Summary

The motivation of this patch is reduce the number of unique ways we handle splats in the RISCV backend. Before this, we would expand the splat vector into a build_vector for generic IR, but for intrinsics we'd frequently end up emitting a splat_vector during lowering and rely on later legalization. This meant that depending on the exact test case you looked at, very similar splats could take divergent paths during ISEL.

This change includes effectively a revert of D120328. This transformation is not generally profitable as it looses the information about the AVL of the vector being inserted. The result of this is that splats which could be done at a narrow VL, are instead done at VLMAX. Given splats are generally scheduled close to their consuming instruction, this results in widespread regressions on RISCV if splat_vectors make it into dag combine. (i.e. we end up needing to toggle VL repeatedly)

If desired, I could move the removed code into a target DAG for aarch64, but before doing that, I was hoping someone would have an idea on how to solve the generic problem in a profitable way. :) I don't know quite enough about AArch64 to know what's needed there, so pointers in the right direction are appreciated.

Diff Detail

Event Timeline

reames created this revision.Sep 16 2022, 12:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2022, 12:03 PM

Herald added subscribers: sunshaoce, VincentWu, StephenFan and 33 others. · View Herald Transcript

reames requested review of this revision.Sep 16 2022, 12:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2022, 12:03 PM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

craig.topper added inline comments.Sep 16 2022, 12:09 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
812	Does this do anything? RISCVTargetLowering::LowerOperation returns SDValue() for SPLAT_VECTOR unless it has i1 type.

Harbormaster completed remote builds in B187200: Diff 460838.Sep 16 2022, 12:59 PM

Are you able to write tests that show the value of removing the combine for riscv? because the current ones don't highlight much. In general it seems good to simplify splat operations as subvector operations get in the way of other combines as well as isel (things like matching immediate operands).

From an AArch64 specific point of view these extracts are normally removed after operation legalisation (where for AArch64/SVE fixed length vectors are lowered to scalable vectors) when you end up with extract/insert subvector pairs. The problematic case is i1 fixed length vectors which are not type legal and hence the combine is necessary to catch those cases.

I've no objection to making this combine target specific but it does seem like something that is generically good and without seeing the exact problem it is hard to suggest an alternative.

I need to better explain this.

Matt added a subscriber: Matt.Oct 5 2022, 9:56 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

5 lines

Target/

RISCV/

RISCVISelLowering.cpp

4 lines

test/

CodeGen/

AArch64/

sve-insert-vector.ll

8 lines

RISCV/

rvv/

vreductions-fp-sdnode.ll

12 lines

Diff 460838

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 23,322 Lines • ▼ Show 20 Lines	if (N1.isUndef())
return N0;		return N0;

// If this is an insert of an extracted vector into an undef vector, we can		// If this is an insert of an extracted vector into an undef vector, we can
// just use the input to the extract.		// just use the input to the extract.
if (N0.isUndef() && N1.getOpcode() == ISD::EXTRACT_SUBVECTOR &&		if (N0.isUndef() && N1.getOpcode() == ISD::EXTRACT_SUBVECTOR &&
N1.getOperand(1) == N2 && N1.getOperand(0).getValueType() == VT)		N1.getOperand(1) == N2 && N1.getOperand(0).getValueType() == VT)
return N1.getOperand(0);		return N1.getOperand(0);

// Simplify scalar inserts into an undef vector:
// insert_subvector undef, (splat X), N2 -> splat X
if (N0.isUndef() && N1.getOpcode() == ISD::SPLAT_VECTOR)
return DAG.getNode(ISD::SPLAT_VECTOR, SDLoc(N), VT, N1.getOperand(0));

// If we are inserting a bitcast value into an undef, with the same		// If we are inserting a bitcast value into an undef, with the same
// number of elements, just use the bitcast input of the extract.		// number of elements, just use the bitcast input of the extract.
// i.e. INSERT_SUBVECTOR UNDEF (BITCAST N1) N2 ->		// i.e. INSERT_SUBVECTOR UNDEF (BITCAST N1) N2 ->
// BITCAST (INSERT_SUBVECTOR UNDEF N1 N2)		// BITCAST (INSERT_SUBVECTOR UNDEF N1 N2)
if (N0.isUndef() && N1.getOpcode() == ISD::BITCAST &&		if (N0.isUndef() && N1.getOpcode() == ISD::BITCAST &&
N1.getOperand(0).getOpcode() == ISD::EXTRACT_SUBVECTOR &&		N1.getOperand(0).getOpcode() == ISD::EXTRACT_SUBVECTOR &&
N1.getOperand(0).getOperand(1) == N2 &&		N1.getOperand(0).getOperand(1) == N2 &&
N1.getOperand(0).getOperand(0).getValueType().getVectorElementCount() ==		N1.getOperand(0).getOperand(0).getValueType().getVectorElementCount() ==
▲ Show 20 Lines • Show All 1,836 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 803 Lines • ▼ Show 20 Lines	if (Subtarget.useRVVForFixedLengthVectors()) {
VT, Custom);		VT, Custom);

setOperationAction({ISD::VP_FP_TO_SINT, ISD::VP_FP_TO_UINT,		setOperationAction({ISD::VP_FP_TO_SINT, ISD::VP_FP_TO_UINT,
ISD::VP_SETCC, ISD::VP_TRUNCATE},		ISD::VP_SETCC, ISD::VP_TRUNCATE},
VT, Custom);		VT, Custom);
continue;		continue;
}		}

		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
		craig.topperUnsubmitted Not Done Reply Inline Actions Does this do anything? RISCVTargetLowering::LowerOperation returns SDValue() for SPLAT_VECTOR unless it has i1 type. craig.topper: Does this do anything? RISCVTargetLowering::LowerOperation returns SDValue() for SPLAT_VECTOR…

// Make SPLAT_VECTOR Legal so DAGCombine will convert splat vectors to		// Make SPLAT_VECTOR Legal so DAGCombine will convert splat vectors to
// it before type legalization for i64 vectors on RV32. It will then be		// it before type legalization for i64 vectors on RV32. It will then be
// type legalized to SPLAT_VECTOR_PARTS which we need to Custom handle.		// type legalized to SPLAT_VECTOR_PARTS which we need to Custom handle.
// FIXME: Use SPLAT_VECTOR for all types? DAGCombine probably needs
// improvements first.
if (!Subtarget.is64Bit() && VT.getVectorElementType() == MVT::i64) {		if (!Subtarget.is64Bit() && VT.getVectorElementType() == MVT::i64) {
setOperationAction(ISD::SPLAT_VECTOR, VT, Legal);		setOperationAction(ISD::SPLAT_VECTOR, VT, Legal);
setOperationAction(ISD::SPLAT_VECTOR_PARTS, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR_PARTS, VT, Custom);
}		}

setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);		setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);
setOperationAction(ISD::INSERT_VECTOR_ELT, VT, Custom);		setOperationAction(ISD::INSERT_VECTOR_ELT, VT, Custom);

▲ Show 20 Lines • Show All 12,431 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-insert-vector.ll

Show First 20 Lines • Show All 669 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%v0 = call <vscale x 16 x i1> @llvm.vector.insert.nxv16i1.nxv4i1(<vscale x 16 x i1> poison, <vscale x 4 x i1> %sv, i64 0)		%v0 = call <vscale x 16 x i1> @llvm.vector.insert.nxv16i1.nxv4i1(<vscale x 16 x i1> poison, <vscale x 4 x i1> %sv, i64 0)
ret <vscale x 16 x i1> %v0		ret <vscale x 16 x i1> %v0
}		}

; Test constant predicate insert into undef		; Test constant predicate insert into undef
define <vscale x 2 x i1> @insert_nxv2i1_v8i1_const_true_into_undef() vscale_range(4,8) {		define <vscale x 2 x i1> @insert_nxv2i1_v8i1_const_true_into_undef() vscale_range(4,8) {
; CHECK-LABEL: insert_nxv2i1_v8i1_const_true_into_undef:		; CHECK-LABEL: insert_nxv2i1_v8i1_const_true_into_undef:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: mov z0.d, #1 // =0x1
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
		; CHECK-NEXT: cmpne p0.d, p0/z, z0.d, #0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%v0 = call <vscale x 2 x i1> @llvm.vector.insert.nxv2i1.v8i1 (<vscale x 2 x i1> undef, <8 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, i64 0)		%v0 = call <vscale x 2 x i1> @llvm.vector.insert.nxv2i1.v8i1 (<vscale x 2 x i1> undef, <8 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, i64 0)
ret <vscale x 2 x i1> %v0		ret <vscale x 2 x i1> %v0
}		}

define <vscale x 4 x i1> @insert_nxv4i1_v16i1_const_true_into_undef() vscale_range(4,8) {		define <vscale x 4 x i1> @insert_nxv4i1_v16i1_const_true_into_undef() vscale_range(4,8) {
; CHECK-LABEL: insert_nxv4i1_v16i1_const_true_into_undef:		; CHECK-LABEL: insert_nxv4i1_v16i1_const_true_into_undef:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: mov z0.s, #1 // =0x1
; CHECK-NEXT: ptrue p0.s		; CHECK-NEXT: ptrue p0.s
		; CHECK-NEXT: cmpne p0.s, p0/z, z0.s, #0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%v0 = call <vscale x 4 x i1> @llvm.vector.insert.nxv4i1.v16i1 (<vscale x 4 x i1> undef, <16 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, i64 0)		%v0 = call <vscale x 4 x i1> @llvm.vector.insert.nxv4i1.v16i1 (<vscale x 4 x i1> undef, <16 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, i64 0)
ret <vscale x 4 x i1> %v0		ret <vscale x 4 x i1> %v0
}		}

define <vscale x 8 x i1> @insert_nxv8i1_v32i1_const_true_into_undef() vscale_range(4,8) {		define <vscale x 8 x i1> @insert_nxv8i1_v32i1_const_true_into_undef() vscale_range(4,8) {
; CHECK-LABEL: insert_nxv8i1_v32i1_const_true_into_undef:		; CHECK-LABEL: insert_nxv8i1_v32i1_const_true_into_undef:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: mov z0.h, #1 // =0x1
; CHECK-NEXT: ptrue p0.h		; CHECK-NEXT: ptrue p0.h
		; CHECK-NEXT: cmpne p0.h, p0/z, z0.h, #0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%v0 = call <vscale x 8 x i1> @llvm.vector.insert.nxv8i1.v32i1 (<vscale x 8 x i1> undef, <32 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, i64 0)		%v0 = call <vscale x 8 x i1> @llvm.vector.insert.nxv8i1.v32i1 (<vscale x 8 x i1> undef, <32 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, i64 0)
ret <vscale x 8 x i1> %v0		ret <vscale x 8 x i1> %v0
}		}

define <vscale x 16 x i1> @insert_nxv16i1_v64i1_const_true_into_undef() vscale_range(4,8) {		define <vscale x 16 x i1> @insert_nxv16i1_v64i1_const_true_into_undef() vscale_range(4,8) {
; CHECK-LABEL: insert_nxv16i1_v64i1_const_true_into_undef:		; CHECK-LABEL: insert_nxv16i1_v64i1_const_true_into_undef:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: mov z0.b, #1 // =0x1
; CHECK-NEXT: ptrue p0.b		; CHECK-NEXT: ptrue p0.b
		; CHECK-NEXT: cmpne p0.b, p0/z, z0.b, #0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%v0 = call <vscale x 16 x i1> @llvm.vector.insert.nxv16i1.v64i1 (<vscale x 16 x i1> undef, <64 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, i64 0)		%v0 = call <vscale x 16 x i1> @llvm.vector.insert.nxv16i1.v64i1 (<vscale x 16 x i1> undef, <64 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, i64 0)
ret <vscale x 16 x i1> %v0		ret <vscale x 16 x i1> %v0
}		}

;		;
; Insert nxv1i1 type into: nxv2i1		; Insert nxv1i1 type into: nxv2i1
;		;
▲ Show 20 Lines • Show All 707 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/vreductions-fp-sdnode.ll

	Show First 20 Lines • Show All 1,056 Lines • ▼ Show 20 Lines
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: srli a0, a0, 3			; CHECK-NEXT: srli a0, a0, 3
	; CHECK-NEXT: slli a1, a0, 1			; CHECK-NEXT: slli a1, a0, 1
	; CHECK-NEXT: add a1, a1, a0			; CHECK-NEXT: add a1, a1, a0
	; CHECK-NEXT: add a0, a1, a0			; CHECK-NEXT: add a0, a1, a0
	; CHECK-NEXT: fmv.h.x ft0, zero			; CHECK-NEXT: fmv.h.x ft0, zero
	; CHECK-NEXT: fneg.h ft0, ft0			; CHECK-NEXT: fneg.h ft0, ft0
	; CHECK-NEXT: vsetvli a2, zero, e16, m1, ta, mu			; CHECK-NEXT: vsetvli a2, zero, e16, mf4, ta, mu
	; CHECK-NEXT: vfmv.v.f v9, ft0			; CHECK-NEXT: vfmv.v.f v9, ft0
	; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, mu			; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, mu
	; CHECK-NEXT: vslideup.vx v8, v9, a1			; CHECK-NEXT: vslideup.vx v8, v9, a1
	; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu			; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu
	; CHECK-NEXT: vfmv.s.f v9, fa0			; CHECK-NEXT: vfmv.s.f v9, fa0
	; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, mu			; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, mu
	; CHECK-NEXT: vfredosum.vs v8, v8, v9			; CHECK-NEXT: vfredosum.vs v8, v8, v9
	; CHECK-NEXT: vfmv.f.s fa0, v8			; CHECK-NEXT: vfmv.f.s fa0, v8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%red = call half @llvm.vector.reduce.fadd.nxv3f16(half %s, <vscale x 3 x half> %v)			%red = call half @llvm.vector.reduce.fadd.nxv3f16(half %s, <vscale x 3 x half> %v)
	ret half %red			ret half %red
	}			}

	declare half @llvm.vector.reduce.fadd.nxv6f16(half, <vscale x 6 x half>)			declare half @llvm.vector.reduce.fadd.nxv6f16(half, <vscale x 6 x half>)

	define half @vreduce_ord_fadd_nxv6f16(<vscale x 6 x half> %v, half %s) {			define half @vreduce_ord_fadd_nxv6f16(<vscale x 6 x half> %v, half %s) {
	; CHECK-LABEL: vreduce_ord_fadd_nxv6f16:			; CHECK-LABEL: vreduce_ord_fadd_nxv6f16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: srli a0, a0, 2			; CHECK-NEXT: srli a0, a0, 2
	; CHECK-NEXT: add a1, a0, a0			; CHECK-NEXT: add a1, a0, a0
	; CHECK-NEXT: fmv.h.x ft0, zero			; CHECK-NEXT: fmv.h.x ft0, zero
	; CHECK-NEXT: fneg.h ft0, ft0			; CHECK-NEXT: fneg.h ft0, ft0
	; CHECK-NEXT: vsetvli a2, zero, e16, m1, ta, mu			; CHECK-NEXT: vsetvli a2, zero, e16, mf2, ta, mu
	; CHECK-NEXT: vfmv.v.f v10, ft0			; CHECK-NEXT: vfmv.v.f v10, ft0
	; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu			; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu
	; CHECK-NEXT: vslideup.vx v9, v10, a0			; CHECK-NEXT: vslideup.vx v9, v10, a0
	; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu			; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu
	; CHECK-NEXT: vfmv.s.f v10, fa0			; CHECK-NEXT: vfmv.s.f v10, fa0
	; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, mu			; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, mu
	; CHECK-NEXT: vfredosum.vs v8, v8, v10			; CHECK-NEXT: vfredosum.vs v8, v8, v10
	; CHECK-NEXT: vfmv.f.s fa0, v8			; CHECK-NEXT: vfmv.f.s fa0, v8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%red = call half @llvm.vector.reduce.fadd.nxv6f16(half %s, <vscale x 6 x half> %v)			%red = call half @llvm.vector.reduce.fadd.nxv6f16(half %s, <vscale x 6 x half> %v)
	ret half %red			ret half %red
	}			}

	declare half @llvm.vector.reduce.fadd.nxv10f16(half, <vscale x 10 x half>)			declare half @llvm.vector.reduce.fadd.nxv10f16(half, <vscale x 10 x half>)

	define half @vreduce_ord_fadd_nxv10f16(<vscale x 10 x half> %v, half %s) {			define half @vreduce_ord_fadd_nxv10f16(<vscale x 10 x half> %v, half %s) {
	; CHECK-LABEL: vreduce_ord_fadd_nxv10f16:			; CHECK-LABEL: vreduce_ord_fadd_nxv10f16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: srli a0, a0, 2			; CHECK-NEXT: srli a0, a0, 2
	; CHECK-NEXT: add a1, a0, a0			; CHECK-NEXT: add a1, a0, a0
	; CHECK-NEXT: fmv.h.x ft0, zero			; CHECK-NEXT: fmv.h.x ft0, zero
	; CHECK-NEXT: fneg.h ft0, ft0			; CHECK-NEXT: fneg.h ft0, ft0
	; CHECK-NEXT: vsetvli a2, zero, e16, m1, ta, mu			; CHECK-NEXT: vsetvli a2, zero, e16, mf2, ta, mu
	; CHECK-NEXT: vfmv.v.f v12, ft0			; CHECK-NEXT: vfmv.v.f v12, ft0
	; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu			; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu
	; CHECK-NEXT: vslideup.vx v10, v12, a0			; CHECK-NEXT: vslideup.vx v10, v12, a0
	; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, mu			; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, mu
	; CHECK-NEXT: vslideup.vi v11, v12, 0			; CHECK-NEXT: vslideup.vi v11, v12, 0
	; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu			; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu
	; CHECK-NEXT: vslideup.vx v11, v12, a0			; CHECK-NEXT: vslideup.vx v11, v12, a0
	; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu			; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu
	Show All 31 Lines
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: srli a0, a0, 3			; CHECK-NEXT: srli a0, a0, 3
	; CHECK-NEXT: slli a1, a0, 1			; CHECK-NEXT: slli a1, a0, 1
	; CHECK-NEXT: add a1, a1, a0			; CHECK-NEXT: add a1, a1, a0
	; CHECK-NEXT: add a0, a1, a0			; CHECK-NEXT: add a0, a1, a0
	; CHECK-NEXT: fmv.h.x ft0, zero			; CHECK-NEXT: fmv.h.x ft0, zero
	; CHECK-NEXT: fneg.h ft0, ft0			; CHECK-NEXT: fneg.h ft0, ft0
	; CHECK-NEXT: vsetvli a2, zero, e16, m1, ta, mu			; CHECK-NEXT: vsetvli a2, zero, e16, mf4, ta, mu
	; CHECK-NEXT: vfmv.v.f v9, ft0			; CHECK-NEXT: vfmv.v.f v9, ft0
	; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, mu			; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, mu
	; CHECK-NEXT: vslideup.vx v8, v9, a1			; CHECK-NEXT: vslideup.vx v8, v9, a1
	; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu			; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu
	; CHECK-NEXT: vfmv.s.f v9, fa0			; CHECK-NEXT: vfmv.s.f v9, fa0
	; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, mu			; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, mu
	; CHECK-NEXT: vfredusum.vs v8, v8, v9			; CHECK-NEXT: vfredusum.vs v8, v8, v9
	; CHECK-NEXT: vfmv.f.s fa0, v8			; CHECK-NEXT: vfmv.f.s fa0, v8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%red = call reassoc half @llvm.vector.reduce.fadd.nxv3f16(half %s, <vscale x 3 x half> %v)			%red = call reassoc half @llvm.vector.reduce.fadd.nxv3f16(half %s, <vscale x 3 x half> %v)
	ret half %red			ret half %red
	}			}

	define half @vreduce_fadd_nxv6f16(<vscale x 6 x half> %v, half %s) {			define half @vreduce_fadd_nxv6f16(<vscale x 6 x half> %v, half %s) {
	; CHECK-LABEL: vreduce_fadd_nxv6f16:			; CHECK-LABEL: vreduce_fadd_nxv6f16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: srli a0, a0, 2			; CHECK-NEXT: srli a0, a0, 2
	; CHECK-NEXT: add a1, a0, a0			; CHECK-NEXT: add a1, a0, a0
	; CHECK-NEXT: fmv.h.x ft0, zero			; CHECK-NEXT: fmv.h.x ft0, zero
	; CHECK-NEXT: fneg.h ft0, ft0			; CHECK-NEXT: fneg.h ft0, ft0
	; CHECK-NEXT: vsetvli a2, zero, e16, m1, ta, mu			; CHECK-NEXT: vsetvli a2, zero, e16, mf2, ta, mu
	; CHECK-NEXT: vfmv.v.f v10, ft0			; CHECK-NEXT: vfmv.v.f v10, ft0
	; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu			; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu
	; CHECK-NEXT: vslideup.vx v9, v10, a0			; CHECK-NEXT: vslideup.vx v9, v10, a0
	; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu			; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu
	; CHECK-NEXT: vfmv.s.f v10, fa0			; CHECK-NEXT: vfmv.s.f v10, fa0
	; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, mu			; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, mu
	; CHECK-NEXT: vfredusum.vs v8, v8, v10			; CHECK-NEXT: vfredusum.vs v8, v8, v10
	; CHECK-NEXT: vfmv.f.s fa0, v8			; CHECK-NEXT: vfmv.f.s fa0, v8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%red = call reassoc half @llvm.vector.reduce.fadd.nxv6f16(half %s, <vscale x 6 x half> %v)			%red = call reassoc half @llvm.vector.reduce.fadd.nxv6f16(half %s, <vscale x 6 x half> %v)
	ret half %red			ret half %red
	}			}

	declare half @llvm.vector.reduce.fmin.nxv10f16(<vscale x 10 x half>)			declare half @llvm.vector.reduce.fmin.nxv10f16(<vscale x 10 x half>)

	define half @vreduce_fmin_nxv10f16(<vscale x 10 x half> %v) {			define half @vreduce_fmin_nxv10f16(<vscale x 10 x half> %v) {
	; CHECK-LABEL: vreduce_fmin_nxv10f16:			; CHECK-LABEL: vreduce_fmin_nxv10f16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: lui a1, %hi(.LCPI73_0)			; CHECK-NEXT: lui a1, %hi(.LCPI73_0)
	; CHECK-NEXT: flh ft0, %lo(.LCPI73_0)(a1)			; CHECK-NEXT: flh ft0, %lo(.LCPI73_0)(a1)
	; CHECK-NEXT: srli a0, a0, 2			; CHECK-NEXT: srli a0, a0, 2
	; CHECK-NEXT: add a1, a0, a0			; CHECK-NEXT: add a1, a0, a0
	; CHECK-NEXT: vsetvli a2, zero, e16, m1, ta, mu			; CHECK-NEXT: vsetvli a2, zero, e16, mf2, ta, mu
	; CHECK-NEXT: vfmv.v.f v12, ft0			; CHECK-NEXT: vfmv.v.f v12, ft0
	; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu			; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu
	; CHECK-NEXT: vslideup.vx v10, v12, a0			; CHECK-NEXT: vslideup.vx v10, v12, a0
	; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, mu			; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, mu
	; CHECK-NEXT: vslideup.vi v11, v12, 0			; CHECK-NEXT: vslideup.vi v11, v12, 0
	; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu			; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu
	; CHECK-NEXT: vslideup.vx v11, v12, a0			; CHECK-NEXT: vslideup.vx v11, v12, a0
	; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu			; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu
	Show All 27 Lines