This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
3/9
LegalizeVectorTypes.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
sve-fp-reduce.ll
-
RISCV/rvv/
-
rvv/
1/4
vreductions-fp-sdnode.ll

Differential D127710

[SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector
ClosedPublic

Authored by Jimerlife on Jun 13 2022, 8:40 PM.

Download Raw Diff

Details

Reviewers

craig.topper
frasercrmck
sdesmalen
benshi001

Commits

rGab25e263a99b: [SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector

Summary

Enable WidenVecOp_VECREDUCE_SEQ for scalable vector by inserting some splatted neutral vectors.
e.g.
when widen <vscale x 10 x half> to <vscale x 16 x half> in index (10, 12, 14) inserted nxv2f16:splat<neutral element>

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Jimerlife created this revision.Jun 13 2022, 8:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2022, 8:40 PM

Herald added subscribers: luke957, StephenFan, ecnelises and 21 others. · View Herald Transcript

Jimerlife requested review of this revision.Jun 13 2022, 8:40 PM

Herald added subscribers: llvm-commits, alextsao1999, • pcwang-thead and 2 others. · View Herald TranscriptJun 13 2022, 8:40 PM

Harbormaster completed remote builds in B169609: Diff 436631.Jun 13 2022, 9:23 PM

craig.topper retitled this revision from [SelectionDAG] Make WidenVecOp_VECREDUCE_SEQ enbale for scalable vector to [SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector.Jun 14 2022, 9:40 AM

craig.topper edited the summary of this revision. (Show Details)

craig.topper added inline comments.Jun 14 2022, 9:45 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6152	Why was this blank line removed?

sdesmalen added inline comments.Jun 14 2022, 10:05 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6151	I may be missing something here, but I'm not sure I understand the need for a `GCD` or for a loop that generates `WideVT.getVectorMinNumElements()` `INSERT_SUBVECTOR`s. Should this code not just insert `VecOp` into a `splat(NeutralElem)` of type `WideVT` using INSERT_SUBVECTOR? i.e. vecreduce_seq(<vscale x 10 x half> op) <=> vecreduce_seq( insert_subvector(op, <vscale x 16 x half> splat(%neutral.elem), /idx=/0)) where `<vscale x 10 x half>` is `OrigVT` and `<vscale x 16 x half>` is `WideVT`.

craig.topper added inline comments.Jun 14 2022, 10:32 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6151	I think you meant `insert_subvector(<vscale x 16 x half> splat(%neutral.elem), op, /idx=/0)`. `op` would be the subvector so should be the second argument. That would need to go through WidenVecOp_INSERT_SUBVECTOR, but it will fail because `InVec` isn't undef.

Jimerlife added inline comments.Jun 14 2022, 7:24 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6151	If we do like this `vecreduce_seq(insert_subvector(<vscale x 16 x half> splat, vecop, /idx=/0))`, that would go through WidenVecOp_INSERT_SUBVECTOR because of `vecop` need to widen. But now, WidenVecOp_INSERT_SUBVECTOR only support insert subvector into undefined vector.

address comment

Jimerlife added inline comments.Jun 14 2022, 7:34 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6152	Mistakenly deleted blank line. Restored back.

Harbormaster completed remote builds in B169892: Diff 437005.Jun 14 2022, 8:14 PM

sdesmalen added inline comments.Jun 15 2022, 6:35 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6151	Thanks, that makes sense.
6152	For SVE there is still the problem that the type action of SplatVT may be `TypeWiden` as well, because we haven't implemented all support for nxv1 types yet. The approach taken here seems valid though, it just means that we can't have the same tests for SVE yet.
llvm/test/CodeGen/RISCV/rvv/vreductions-fp-sdnode.ll
1081	Could you also add `@vreduce_ord_fadd_nxv6f16` and `@vreduce_ord_fadd_nxv10f16` to llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization-strict.ll as well ? (and then just re-generate the CHECK lines using the `update_lcc_test_checks.py` script)

sdesmalen added inline comments.Jun 15 2022, 6:48 AM

llvm/test/CodeGen/RISCV/rvv/vreductions-fp-sdnode.ll
1081	And `@vreduce_ord_fadd_nxv12f16` as well.

Jimerlife added inline comments.Jun 15 2022, 7:45 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6152	Should I need to add `getTypeAction(SplatVT) != TargetLowering::TypeWidenVector` to avoid this problem now?

sdesmalen added inline comments.Jun 15 2022, 7:49 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6152	No I don't think that's necessary, but thanks for the suggestion! When we fix the nxv1 legalisation of insert_subvector, we'll get the support added in this patch for free. And the compiler will currently crash with an error either way :) It was more just an observation.

address comment and add test for AArch64 SVE

Jimerlife added inline comments.Jun 16 2022, 12:27 AM

llvm/test/CodeGen/RISCV/rvv/vreductions-fp-sdnode.ll
1081	This vecreduce-fadd-legalization-strict.ll file is only for NEON feature. So, I add tests in sve-fp-reduce.ll. Is it OK?

Harbormaster completed remote builds in B170201: Diff 437460.Jun 16 2022, 1:01 AM

Matt added a subscriber: Matt.Jun 16 2022, 4:40 PM

LGTM!

llvm/test/CodeGen/RISCV/rvv/vreductions-fp-sdnode.ll
1081	Thanks, I hadn't seen that. That's fine!

This revision is now accepted and ready to land.Jun 17 2022, 5:26 AM

This revision was landed with ongoing or failed builds.Jun 19 2022, 11:31 PM

Closed by commit rGab25e263a99b: [SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector (authored by Jimerlife). · Explain Why

This revision was automatically updated to reflect the committed changes.

Jimerlife added a commit: rGab25e263a99b: [SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector.

Jimerlife mentioned this in D128239: [SelectionDAG] Enable WidenVecOp_VECREDUCE for scalable vector.Jun 20 2022, 7:39 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

LegalizeVectorTypes.cpp

16 lines

test/

CodeGen/

AArch64/

sve-fp-reduce.ll

76 lines

RISCV/

rvv/

vreductions-fp-sdnode.ll

102 lines

Diff 438261

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 6,138 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::WidenVecOp_VECREDUCE_SEQ(SDNode *N) {
EVT ElemVT = OrigVT.getVectorElementType();		EVT ElemVT = OrigVT.getVectorElementType();
SDNodeFlags Flags = N->getFlags();		SDNodeFlags Flags = N->getFlags();

unsigned Opc = N->getOpcode();		unsigned Opc = N->getOpcode();
unsigned BaseOpc = ISD::getVecReduceBaseOpcode(Opc);		unsigned BaseOpc = ISD::getVecReduceBaseOpcode(Opc);
SDValue NeutralElem = DAG.getNeutralElement(BaseOpc, dl, ElemVT, Flags);		SDValue NeutralElem = DAG.getNeutralElement(BaseOpc, dl, ElemVT, Flags);

// Pad the vector with the neutral element.		// Pad the vector with the neutral element.
unsigned OrigElts = OrigVT.getVectorNumElements();		unsigned OrigElts = OrigVT.getVectorMinNumElements();
unsigned WideElts = WideVT.getVectorNumElements();		unsigned WideElts = WideVT.getVectorMinNumElements();

		if (WideVT.isScalableVector()) {
		unsigned GCD = greatestCommonDivisor(OrigElts, WideElts);
		sdesmalenUnsubmitted Not Done Reply Inline Actions I may be missing something here, but I'm not sure I understand the need for a `GCD` or for a loop that generates `WideVT.getVectorMinNumElements()` `INSERT_SUBVECTOR`s. Should this code not just insert `VecOp` into a `splat(NeutralElem)` of type `WideVT` using INSERT_SUBVECTOR? i.e. vecreduce_seq(<vscale x 10 x half> op) <=> vecreduce_seq( insert_subvector(op, <vscale x 16 x half> splat(%neutral.elem), /idx=/0)) where `<vscale x 10 x half>` is `OrigVT` and `<vscale x 16 x half>` is `WideVT`. sdesmalen: I may be missing something here, but I'm not sure I understand the need for a `GCD` or for a…
		craig.topperUnsubmitted Not Done Reply Inline Actions I think you meant `insert_subvector(<vscale x 16 x half> splat(%neutral.elem), op, /idx=/0)`. `op` would be the subvector so should be the second argument. That would need to go through WidenVecOp_INSERT_SUBVECTOR, but it will fail because `InVec` isn't undef. craig.topper: I think you meant `insert_subvector(<vscale x 16 x half> splat(%neutral.elem), op, /idx=/0)`.
		JimerlifeAuthorUnsubmitted Done Reply Inline Actions If we do like this `vecreduce_seq(insert_subvector(<vscale x 16 x half> splat, vecop, /idx=/0))`, that would go through WidenVecOp_INSERT_SUBVECTOR because of `vecop` need to widen. But now, WidenVecOp_INSERT_SUBVECTOR only support insert subvector into undefined vector. Jimerlife: If we do like this `vecreduce_seq(insert_subvector(<vscale x 16 x half> splat, vecop…
		sdesmalenUnsubmitted Not Done Reply Inline Actions Thanks, that makes sense. sdesmalen: Thanks, that makes sense.
		EVT SplatVT = EVT::getVectorVT(*DAG.getContext(), ElemVT,
		sdesmalenUnsubmitted Not Done Reply Inline Actions For SVE there is still the problem that the type action of SplatVT may be `TypeWiden` as well, because we haven't implemented all support for nxv1 types yet. The approach taken here seems valid though, it just means that we can't have the same tests for SVE yet. sdesmalen: For SVE there is still the problem that the type action of SplatVT may be `TypeWiden` as well…
		JimerlifeAuthorUnsubmitted Done Reply Inline Actions Should I need to add `getTypeAction(SplatVT) != TargetLowering::TypeWidenVector` to avoid this problem now? Jimerlife: Should I need to add `getTypeAction(SplatVT) != TargetLowering::TypeWidenVector` to avoid this…
		sdesmalenUnsubmitted Not Done Reply Inline Actions No I don't think that's necessary, but thanks for the suggestion! When we fix the nxv1 legalisation of insert_subvector, we'll get the support added in this patch for free. And the compiler will currently crash with an error either way :) It was more just an observation. sdesmalen: No I don't think that's necessary, but thanks for the suggestion! When we fix the nxv1…
		ElementCount::getScalable(GCD));
		SDValue SplatNeutral = DAG.getSplatVector(SplatVT, dl, NeutralElem);
		for (unsigned Idx = OrigElts; Idx < WideElts; Idx = Idx + GCD)
		Op = DAG.getNode(ISD::INSERT_SUBVECTOR, dl, WideVT, Op, SplatNeutral,
		DAG.getVectorIdxConstant(Idx, dl));
		return DAG.getNode(Opc, dl, N->getValueType(0), AccOp, Op, Flags);
		}

for (unsigned Idx = OrigElts; Idx < WideElts; Idx++)		for (unsigned Idx = OrigElts; Idx < WideElts; Idx++)
Op = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, WideVT, Op, NeutralElem,		Op = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, WideVT, Op, NeutralElem,
DAG.getVectorIdxConstant(Idx, dl));		DAG.getVectorIdxConstant(Idx, dl));

craig.topperUnsubmitted Not Done Reply Inline Actions Why was this blank line removed? craig.topper: Why was this blank line removed?
JimerlifeAuthorUnsubmitted Done Reply Inline Actions Mistakenly deleted blank line. Restored back. Jimerlife: Mistakenly deleted blank line. Restored back.
return DAG.getNode(Opc, dl, N->getValueType(0), AccOp, Op, Flags);		return DAG.getNode(Opc, dl, N->getValueType(0), AccOp, Op, Flags);
}		}

SDValue DAGTypeLegalizer::WidenVecOp_VP_REDUCE(SDNode *N) {		SDValue DAGTypeLegalizer::WidenVecOp_VP_REDUCE(SDNode *N) {
assert(N->isVPOpcode() && "Expected VP opcode");		assert(N->isVPOpcode() && "Expected VP opcode");

SDLoc dl(N);		SDLoc dl(N);
SDValue Op = GetWidenedVector(N->getOperand(1));		SDValue Op = GetWidenedVector(N->getOperand(1));
▲ Show 20 Lines • Show All 508 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fp-reduce.ll

	Show All 23 Lines
	; CHECK-LABEL: fadda_nxv8f16:			; CHECK-LABEL: fadda_nxv8f16:
	; CHECK: ptrue p0.h			; CHECK: ptrue p0.h
	; CHECK-NEXT: fadda h0, p0, h0, z1.h			; CHECK-NEXT: fadda h0, p0, h0, z1.h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call half @llvm.vector.reduce.fadd.nxv8f16(half %init, <vscale x 8 x half> %a)			%res = call half @llvm.vector.reduce.fadd.nxv8f16(half %init, <vscale x 8 x half> %a)
	ret half %res			ret half %res
	}			}

				define half @fadda_nxv6f16(<vscale x 6 x half> %v, half %s) {
				; CHECK-LABEL: fadda_nxv6f16:
				; CHECK: str x29, [sp, #-16]!
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22
				; CHECK-NEXT: adrp x8, .LCPI3_0
				; CHECK-NEXT: add x8, x8, :lo12:.LCPI3_0
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: st1h { z0.h }, p0, [sp]
				; CHECK-NEXT: ld1rh { z0.d }, p1/z, [x8]
				; CHECK-NEXT: st1h { z0.d }, p1, [sp, #3, mul vl]
				; CHECK-NEXT: fmov s0, s1
				; CHECK-NEXT: ld1h { z2.h }, p0/z, [sp]
				; CHECK-NEXT: fadda h0, p0, h0, z2.h
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16
				; CHECK-NEXT: ret
				%res = call half @llvm.vector.reduce.fadd.nxv6f16(half %s, <vscale x 6 x half> %v)
				ret half %res
				}

				define half @fadda_nxv10f16(<vscale x 10 x half> %v, half %s) {
				; CHECK-LABEL: fadda_nxv10f16:
				; CHECK: str x29, [sp, #-16]!
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: addvl sp, sp, #-3
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x18, 0x92, 0x2e, 0x00, 0x1e, 0x22
				; CHECK-NEXT: adrp x8, .LCPI4_0
				; CHECK-NEXT: add x8, x8, :lo12:.LCPI4_0
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: st1h { z1.h }, p0, [sp]
				; CHECK-NEXT: ld1rh { z1.d }, p1/z, [x8]
				; CHECK-NEXT: addvl x8, sp, #1
				; CHECK-NEXT: fadda h2, p0, h2, z0.h
				; CHECK-NEXT: st1h { z1.d }, p1, [sp, #1, mul vl]
				; CHECK-NEXT: ld1h { z3.h }, p0/z, [sp]
				; CHECK-NEXT: st1h { z3.h }, p0, [sp, #1, mul vl]
				; CHECK-NEXT: st1h { z1.d }, p1, [sp, #6, mul vl]
				; CHECK-NEXT: ld1h { z3.h }, p0/z, [sp, #1, mul vl]
				; CHECK-NEXT: st1h { z3.h }, p0, [sp, #2, mul vl]
				; CHECK-NEXT: st1h { z1.d }, p1, [x8, #7, mul vl]
				; CHECK-NEXT: ld1h { z1.h }, p0/z, [sp, #2, mul vl]
				; CHECK-NEXT: fadda h2, p0, h2, z1.h
				; CHECK-NEXT: fmov s0, s2
				; CHECK-NEXT: addvl sp, sp, #3
				; CHECK-NEXT: ldr x29, [sp], #16
				; CHECK-NEXT: ret
				%res = call half @llvm.vector.reduce.fadd.nxv10f16(half %s, <vscale x 10 x half> %v)
				ret half %res
				}

				define half @fadda_nxv12f16(<vscale x 12 x half> %v, half %s) {
				; CHECK-LABEL: fadda_nxv12f16:
				; CHECK: adrp x8, .LCPI5_0
				; CHECK-NEXT: add x8, x8, :lo12:.LCPI5_0
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: uunpklo z1.s, z1.h
				; CHECK-NEXT: ld1rh { z3.s }, p0/z, [x8]
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: fadda h2, p0, h2, z0.h
				; CHECK-NEXT: uzp1 z1.h, z1.h, z3.h
				; CHECK-NEXT: fadda h2, p0, h2, z1.h
				; CHECK-NEXT: fmov s0, s2
				; CHECK-NEXT: ret
				%res = call half @llvm.vector.reduce.fadd.nxv12f16(half %s, <vscale x 12 x half> %v)
				ret half %res
				}

	define float @fadda_nxv2f32(float %init, <vscale x 2 x float> %a) {			define float @fadda_nxv2f32(float %init, <vscale x 2 x float> %a) {
	; CHECK-LABEL: fadda_nxv2f32:			; CHECK-LABEL: fadda_nxv2f32:
	; CHECK: ptrue p0.d			; CHECK: ptrue p0.d
	; CHECK-NEXT: fadda s0, p0, s0, z1.s			; CHECK-NEXT: fadda s0, p0, s0, z1.s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call float @llvm.vector.reduce.fadd.nxv2f32(float %init, <vscale x 2 x float> %a)			%res = call float @llvm.vector.reduce.fadd.nxv2f32(float %init, <vscale x 2 x float> %a)
	ret float %res			ret float %res
	}			}
	▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call double @llvm.vector.reduce.fmin.nxv2f64(<vscale x 2 x double> %a)			%res = call double @llvm.vector.reduce.fmin.nxv2f64(<vscale x 2 x double> %a)
	ret double %res			ret double %res
	}			}

	declare half @llvm.vector.reduce.fadd.nxv2f16(half, <vscale x 2 x half>)			declare half @llvm.vector.reduce.fadd.nxv2f16(half, <vscale x 2 x half>)
	declare half @llvm.vector.reduce.fadd.nxv4f16(half, <vscale x 4 x half>)			declare half @llvm.vector.reduce.fadd.nxv4f16(half, <vscale x 4 x half>)
	declare half @llvm.vector.reduce.fadd.nxv8f16(half, <vscale x 8 x half>)			declare half @llvm.vector.reduce.fadd.nxv8f16(half, <vscale x 8 x half>)
				declare half @llvm.vector.reduce.fadd.nxv6f16(half, <vscale x 6 x half>)
				declare half @llvm.vector.reduce.fadd.nxv10f16(half, <vscale x 10 x half>)
				declare half @llvm.vector.reduce.fadd.nxv12f16(half, <vscale x 12 x half>)
	declare float @llvm.vector.reduce.fadd.nxv2f32(float, <vscale x 2 x float>)			declare float @llvm.vector.reduce.fadd.nxv2f32(float, <vscale x 2 x float>)
	declare float @llvm.vector.reduce.fadd.nxv4f32(float, <vscale x 4 x float>)			declare float @llvm.vector.reduce.fadd.nxv4f32(float, <vscale x 4 x float>)
	declare double @llvm.vector.reduce.fadd.nxv2f64(double, <vscale x 2 x double>)			declare double @llvm.vector.reduce.fadd.nxv2f64(double, <vscale x 2 x double>)

	declare half @llvm.vector.reduce.fmax.nxv2f16(<vscale x 2 x half>)			declare half @llvm.vector.reduce.fmax.nxv2f16(<vscale x 2 x half>)
	declare half @llvm.vector.reduce.fmax.nxv4f16(<vscale x 4 x half>)			declare half @llvm.vector.reduce.fmax.nxv4f16(<vscale x 4 x half>)
	declare half @llvm.vector.reduce.fmax.nxv8f16(<vscale x 8 x half>)			declare half @llvm.vector.reduce.fmax.nxv8f16(<vscale x 8 x half>)
	declare float @llvm.vector.reduce.fmax.nxv2f32(<vscale x 2 x float>)			declare float @llvm.vector.reduce.fmax.nxv2f32(<vscale x 2 x float>)
	Show All 9 Lines

llvm/test/CodeGen/RISCV/rvv/vreductions-fp-sdnode.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv32 -mattr=+d,+zfh,+experimental-zvfh,+v -target-abi=ilp32d \			; RUN: llc -mtriple=riscv32 -mattr=+d,+zfh,+experimental-zvfh,+v,+m -target-abi=ilp32d \
	; RUN: -verify-machineinstrs < %s \| FileCheck %s			; RUN: -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -mtriple=riscv64 -mattr=+d,+zfh,+experimental-zvfh,+v -target-abi=lp64d \			; RUN: llc -mtriple=riscv64 -mattr=+d,+zfh,+experimental-zvfh,+v,+m -target-abi=lp64d \
	; RUN: -verify-machineinstrs < %s \| FileCheck %s			; RUN: -verify-machineinstrs < %s \| FileCheck %s

	declare half @llvm.vector.reduce.fadd.nxv1f16(half, <vscale x 1 x half>)			declare half @llvm.vector.reduce.fadd.nxv1f16(half, <vscale x 1 x half>)

	define half @vreduce_fadd_nxv1f16(<vscale x 1 x half> %v, half %s) {			define half @vreduce_fadd_nxv1f16(<vscale x 1 x half> %v, half %s) {
	; CHECK-LABEL: vreduce_fadd_nxv1f16:			; CHECK-LABEL: vreduce_fadd_nxv1f16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu			; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu
	▲ Show 20 Lines • Show All 1,030 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vfmv.s.f v9, fa0			; CHECK-NEXT: vfmv.s.f v9, fa0
	; CHECK-NEXT: vsetvli a0, zero, e32, mf2, ta, mu			; CHECK-NEXT: vsetvli a0, zero, e32, mf2, ta, mu
	; CHECK-NEXT: vfredusum.vs v8, v8, v9			; CHECK-NEXT: vfredusum.vs v8, v8, v9
	; CHECK-NEXT: vfmv.f.s fa0, v8			; CHECK-NEXT: vfmv.f.s fa0, v8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%red = call reassoc nsz float @llvm.vector.reduce.fadd.nxv1f32(float %s, <vscale x 1 x float> %v)			%red = call reassoc nsz float @llvm.vector.reduce.fadd.nxv1f32(float %s, <vscale x 1 x float> %v)
	ret float %red			ret float %red
	}			}

				; Test Widen VECREDUCE_SEQ_FADD
				declare half @llvm.vector.reduce.fadd.nxv3f16(half, <vscale x 3 x half>)

				define half @vreduce_ord_fadd_nxv3f16(<vscale x 3 x half> %v, half %s) {
				; CHECK-LABEL: vreduce_ord_fadd_nxv3f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 3
				; CHECK-NEXT: slli a1, a0, 1
				; CHECK-NEXT: add a1, a1, a0
				; CHECK-NEXT: add a0, a1, a0
				; CHECK-NEXT: fmv.h.x ft0, zero
				; CHECK-NEXT: fneg.h ft0, ft0
				; CHECK-NEXT: vsetvli a2, zero, e16, m1, ta, mu
				; CHECK-NEXT: vfmv.v.f v9, ft0
				; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, mu
				; CHECK-NEXT: vslideup.vx v8, v9, a1
				; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu
				; CHECK-NEXT: vfmv.s.f v9, fa0
				; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, mu
				; CHECK-NEXT: vfredosum.vs v8, v8, v9
				; CHECK-NEXT: vfmv.f.s fa0, v8
				; CHECK-NEXT: ret
				%red = call half @llvm.vector.reduce.fadd.nxv3f16(half %s, <vscale x 3 x half> %v)
				ret half %red
				}

				declare half @llvm.vector.reduce.fadd.nxv6f16(half, <vscale x 6 x half>)

				define half @vreduce_ord_fadd_nxv6f16(<vscale x 6 x half> %v, half %s) {
				sdesmalenUnsubmitted Not Done Reply Inline Actions Could you also add `@vreduce_ord_fadd_nxv6f16` and `@vreduce_ord_fadd_nxv10f16` to llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization-strict.ll as well ? (and then just re-generate the CHECK lines using the `update_lcc_test_checks.py` script) sdesmalen: Could you also add `@vreduce_ord_fadd_nxv6f16` and `@vreduce_ord_fadd_nxv10f16` to…
				sdesmalenUnsubmitted Not Done Reply Inline Actions And `@vreduce_ord_fadd_nxv12f16` as well. sdesmalen: And `@vreduce_ord_fadd_nxv12f16` as well.
				JimerlifeAuthorUnsubmitted Done Reply Inline Actions This vecreduce-fadd-legalization-strict.ll file is only for NEON feature. So, I add tests in sve-fp-reduce.ll. Is it OK? Jimerlife: This vecreduce-fadd-legalization-strict.ll file is only for NEON feature. So, I add tests in…
				sdesmalenUnsubmitted Not Done Reply Inline Actions Thanks, I hadn't seen that. That's fine! sdesmalen: Thanks, I hadn't seen that. That's fine!
				; CHECK-LABEL: vreduce_ord_fadd_nxv6f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: add a1, a0, a0
				; CHECK-NEXT: fmv.h.x ft0, zero
				; CHECK-NEXT: fneg.h ft0, ft0
				; CHECK-NEXT: vsetvli a2, zero, e16, m1, ta, mu
				; CHECK-NEXT: vfmv.v.f v10, ft0
				; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu
				; CHECK-NEXT: vslideup.vx v9, v10, a0
				; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu
				; CHECK-NEXT: vfmv.s.f v10, fa0
				; CHECK-NEXT: vsetvli a0, zero, e16, m2, ta, mu
				; CHECK-NEXT: vfredosum.vs v8, v8, v10
				; CHECK-NEXT: vfmv.f.s fa0, v8
				; CHECK-NEXT: ret
				%red = call half @llvm.vector.reduce.fadd.nxv6f16(half %s, <vscale x 6 x half> %v)
				ret half %red
				}

				declare half @llvm.vector.reduce.fadd.nxv10f16(half, <vscale x 10 x half>)

				define half @vreduce_ord_fadd_nxv10f16(<vscale x 10 x half> %v, half %s) {
				; CHECK-LABEL: vreduce_ord_fadd_nxv10f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: srli a0, a0, 2
				; CHECK-NEXT: add a1, a0, a0
				; CHECK-NEXT: fmv.h.x ft0, zero
				; CHECK-NEXT: fneg.h ft0, ft0
				; CHECK-NEXT: vsetvli a2, zero, e16, m1, ta, mu
				; CHECK-NEXT: vfmv.v.f v12, ft0
				; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu
				; CHECK-NEXT: vslideup.vx v10, v12, a0
				; CHECK-NEXT: vsetvli zero, a0, e16, m1, tu, mu
				; CHECK-NEXT: vslideup.vi v11, v12, 0
				; CHECK-NEXT: vsetvli zero, a1, e16, m1, tu, mu
				; CHECK-NEXT: vslideup.vx v11, v12, a0
				; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu
				; CHECK-NEXT: vfmv.s.f v12, fa0
				; CHECK-NEXT: vsetvli a0, zero, e16, m4, ta, mu
				; CHECK-NEXT: vfredosum.vs v8, v8, v12
				; CHECK-NEXT: vfmv.f.s fa0, v8
				; CHECK-NEXT: ret
				%red = call half @llvm.vector.reduce.fadd.nxv10f16(half %s, <vscale x 10 x half> %v)
				ret half %red
				}

				declare half @llvm.vector.reduce.fadd.nxv12f16(half, <vscale x 12 x half>)

				define half @vreduce_ord_fadd_nxv12f16(<vscale x 12 x half> %v, half %s) {
				; CHECK-LABEL: vreduce_ord_fadd_nxv12f16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, mu
				; CHECK-NEXT: vfmv.s.f v12, fa0
				; CHECK-NEXT: fmv.h.x ft0, zero
				; CHECK-NEXT: fneg.h ft0, ft0
				; CHECK-NEXT: vsetvli a0, zero, e16, m1, ta, mu
				; CHECK-NEXT: vfmv.v.f v11, ft0
				; CHECK-NEXT: vsetvli a0, zero, e16, m4, ta, mu
				; CHECK-NEXT: vfredosum.vs v8, v8, v12
				; CHECK-NEXT: vfmv.f.s fa0, v8
				; CHECK-NEXT: ret
				%red = call half @llvm.vector.reduce.fadd.nxv12f16(half %s, <vscale x 12 x half> %v)
				ret half %red
				}