This is an archive of the discontinued LLVM Phabricator instance.

Also, while I have everyone's attention, there are a number of unhandled vector reduction intrinsics with SVE support. Do we want to add lowerings for those? E.g. ANDV.

Harbormaster completed remote builds in B73225: Diff 294789.Sep 28 2020, 1:22 PM

In D88444#2299066, @cameron.mcinally wrote:

Also, while I have everyone's attention, there are a number of unhandled vector reduction intrinsics with SVE support. Do we want to add lowerings for those? E.g. ANDV.

Yes please.

paulwalker-arm added inline comments.Sep 29 2020, 3:50 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9667–9669	There are f16 MAX/MIN reduction instructions for NEON, it's just they're an optional v8.2 extension. However, when SVE is implemented the extension is mandatory so we can rely on them. I did a quick test using your tests against master and they just worked.
llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll
199	What's going on here?
211	And here?

Address reviews...

cameron.mcinally added inline comments.Sep 29 2020, 7:44 AM

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll
199	Fatigue error. Botched the CHECK line copy-and-paste and missed it. Sorry about that.

cameron.mcinally added inline comments.Sep 29 2020, 8:14 AM

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll
199	Pffff. Looks like this copy-and-paste problem has history. Correcting other tests now...

cameron.mcinally added inline comments.Sep 29 2020, 8:33 AM

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll
199	Fixed with 01c95f79424d.

paulwalker-arm added inline comments.Sep 29 2020, 9:03 AM

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll
69	1024
131	1024
142	2048
259	1024
322	1024
333	2048

Update more typos...

Sorry again, Paul. Still looking at how far this propagated. Looks like it was introduced with the VECREDUC_ADD patch. Need some more time...

Ok, I think that's all of them. Looks like it started with D87796 and was buried in other changes. To confirm:

<scrubbed> CodeGen/AArch64> grep -rn VBITS_GE_1048 *
<scrubbed> CodeGen/AArch64> grep -rn VBITS_GE_2086 *
<scrubbed> CodeGen/AArch64> grep -rn VBITS_GE_2096 *
<scrubbed> CodeGen/AArch64>

Sorry again. I shouldn't have done that.

@cameron.mcinally No worries, I clearly didn't do the best job reviewing that patch either.

This revision is now accepted and ready to land.Sep 29 2020, 9:45 AM

Closed by commit rG80381c4dc925: [SVE] Lower fixed length VECREDUCE_[FMAX|FMIN] to Scalable (authored by cameron.mcinally). · Explain WhySep 29 2020, 2:31 PM

This revision was automatically updated to reflect the committed changes.

cameron.mcinally added a commit: rG80381c4dc925: [SVE] Lower fixed length VECREDUCE_[FMAX|FMIN] to Scalable.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

10 lines

test/

CodeGen/

AArch64/

sve-fixed-length-fp-reduce.ll

445 lines

Diff 294987

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,217 Lines • ▼ Show 20 Lines	void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {
setOperationAction(ISD::SRL, VT, Custom);		setOperationAction(ISD::SRL, VT, Custom);
setOperationAction(ISD::STORE, VT, Custom);		setOperationAction(ISD::STORE, VT, Custom);
setOperationAction(ISD::SUB, VT, Custom);		setOperationAction(ISD::SUB, VT, Custom);
setOperationAction(ISD::TRUNCATE, VT, Custom);		setOperationAction(ISD::TRUNCATE, VT, Custom);
setOperationAction(ISD::UDIV, VT, Custom);		setOperationAction(ISD::UDIV, VT, Custom);
setOperationAction(ISD::UMAX, VT, Custom);		setOperationAction(ISD::UMAX, VT, Custom);
setOperationAction(ISD::UMIN, VT, Custom);		setOperationAction(ISD::UMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);
		setOperationAction(ISD::VECREDUCE_FMAX, VT, Custom);
		setOperationAction(ISD::VECREDUCE_FMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);
setOperationAction(ISD::VSELECT, VT, Custom);		setOperationAction(ISD::VSELECT, VT, Custom);
setOperationAction(ISD::XOR, VT, Custom);		setOperationAction(ISD::XOR, VT, Custom);
setOperationAction(ISD::ZERO_EXTEND, VT, Custom);		setOperationAction(ISD::ZERO_EXTEND, VT, Custom);
}		}
▲ Show 20 Lines • Show All 8,423 Lines • ▼ Show 20 Lines
}		}

SDValue AArch64TargetLowering::LowerVECREDUCE(SDValue Op,		SDValue AArch64TargetLowering::LowerVECREDUCE(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDValue Src = Op.getOperand(0);		SDValue Src = Op.getOperand(0);

// Try to lower fixed length reductions to SVE.		// Try to lower fixed length reductions to SVE.
EVT SrcVT = Src.getValueType();		EVT SrcVT = Src.getValueType();
bool OverrideNEON = SrcVT.getVectorElementType() == MVT::i64 &&		bool OverrideNEON = Op.getOpcode() != ISD::VECREDUCE_ADD &&
Op.getOpcode() != ISD::VECREDUCE_ADD;		SrcVT.getVectorElementType() == MVT::i64;
if (useSVEForFixedLengthVectorVT(SrcVT, OverrideNEON)) {		if (useSVEForFixedLengthVectorVT(SrcVT, OverrideNEON)) {
		paulwalker-armUnsubmitted Done Reply Inline Actions There are f16 MAX/MIN reduction instructions for NEON, it's just they're an optional v8.2 extension. However, when SVE is implemented the extension is mandatory so we can rely on them. I did a quick test using your tests against master and they just worked. paulwalker-arm: There are f16 MAX/MIN reduction instructions for NEON, it's just they're an optional v8.2…
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
return LowerFixedLengthReductionToSVE(AArch64ISD::UADDV_PRED, Op, DAG);		return LowerFixedLengthReductionToSVE(AArch64ISD::UADDV_PRED, Op, DAG);
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
return LowerFixedLengthReductionToSVE(AArch64ISD::SMAXV_PRED, Op, DAG);		return LowerFixedLengthReductionToSVE(AArch64ISD::SMAXV_PRED, Op, DAG);
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
return LowerFixedLengthReductionToSVE(AArch64ISD::SMINV_PRED, Op, DAG);		return LowerFixedLengthReductionToSVE(AArch64ISD::SMINV_PRED, Op, DAG);
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
return LowerFixedLengthReductionToSVE(AArch64ISD::UMAXV_PRED, Op, DAG);		return LowerFixedLengthReductionToSVE(AArch64ISD::UMAXV_PRED, Op, DAG);
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
return LowerFixedLengthReductionToSVE(AArch64ISD::UMINV_PRED, Op, DAG);		return LowerFixedLengthReductionToSVE(AArch64ISD::UMINV_PRED, Op, DAG);
		case ISD::VECREDUCE_FMAX:
		return LowerFixedLengthReductionToSVE(AArch64ISD::FMAXNMV_PRED, Op, DAG);
		case ISD::VECREDUCE_FMIN:
		return LowerFixedLengthReductionToSVE(AArch64ISD::FMINNMV_PRED, Op, DAG);
default:		default:
llvm_unreachable("Unhandled fixed length reduction");		llvm_unreachable("Unhandled fixed length reduction");
}		}
}		}

// Lower NEON reductions.		// Lower NEON reductions.
SDLoc dl(Op);		SDLoc dl(Op);
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
▲ Show 20 Lines • Show All 6,403 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll

This file was added.

				; RUN: llc -aarch64-sve-vector-bits-min=128 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=16 -check-prefix=NO_SVE
				; RUN: llc -aarch64-sve-vector-bits-min=256 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK,VBITS_EQ_256
				; RUN: llc -aarch64-sve-vector-bits-min=384 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK
				; RUN: llc -aarch64-sve-vector-bits-min=512 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=640 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=768 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=896 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=1024 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1152 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1280 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1408 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1536 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1664 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1792 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1920 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=2048 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=256 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048

				target triple = "aarch64-unknown-linux-gnu"

				; Don't use SVE when its registers are no bigger than NEON.
				; NO_SVE-NOT: ptrue

				;
				; FMAXV
				;

				; No NEON 16-bit vector FMAXNMV support. Use SVE.
				define half @fmaxv_v4f16(<4 x half> %a) #0 {
				; CHECK-LABEL: fmaxv_v4f16:
				; CHECK: fmaxnmv h0, v0.4h
				; CHECK-NEXT: ret
				%res = call half @llvm.experimental.vector.reduce.fmax.v4f16(<4 x half> %a)
				ret half %res
				}

				; No NEON 16-bit vector FMAXNMV support. Use SVE.
				define half @fmaxv_v8f16(<8 x half> %a) #0 {
				; CHECK-LABEL: fmaxv_v8f16:
				; CHECK: fmaxnmv h0, v0.8h
				; CHECK-NEXT: ret
				%res = call half @llvm.experimental.vector.reduce.fmax.v8f16(<8 x half> %a)
				ret half %res
				}

				define half @fmaxv_v16f16(<16 x half>* %a) #0 {
				; CHECK-LABEL: fmaxv_v16f16:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].h, vl16
				; VBITS_GE_256-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_256-NEXT: fmaxnmv h0, [[PG]], [[OP]].h
				; VBITS_GE_256-NEXT: ret
				%op = load <16 x half>, <16 x half>* %a
				%res = call half @llvm.experimental.vector.reduce.fmax.v16f16(<16 x half> %op)
				ret half %res
				}

				define half @fmaxv_v32f16(<32 x half>* %a) #0 {
				; CHECK-LABEL: fmaxv_v32f16:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].h, vl32
				; VBITS_GE_512-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_512-NEXT: fmaxnmv h0, [[PG]], [[OP]].h
				; VBITS_GE_512-NEXT: ret
				%op = load <32 x half>, <32 x half>* %a
				%res = call half @llvm.experimental.vector.reduce.fmax.v32f16(<32 x half> %op)
				ret half %res
				}

				define half @fmaxv_v64f16(<64 x half>* %a) #0 {
				; CHECK-LABEL: fmaxv_v64f16:
				; VBITS_GE_1048: ptrue [[PG:p[0-9]+]].h, vl64
				paulwalker-armUnsubmitted Done Reply Inline Actions 1024 paulwalker-arm: 1024
				; VBITS_GE_1048-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_1048-NEXT: fmaxnmv h0, [[PG]], [[OP]].h
				; VBITS_GE_1048-NEXT: ret
				%op = load <64 x half>, <64 x half>* %a
				%res = call half @llvm.experimental.vector.reduce.fmax.v64f16(<64 x half> %op)
				ret half %res
				}

				define half @fmaxv_v128f16(<128 x half>* %a) #0 {
				; CHECK-LABEL: fmaxv_v128f16:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].h, vl128
				; VBITS_GE_2048-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_2048-NEXT: fmaxnmv h0, [[PG]], [[OP]].h
				; VBITS_GE_2048-NEXT: ret
				%op = load <128 x half>, <128 x half>* %a
				%res = call half @llvm.experimental.vector.reduce.fmax.v128f16(<128 x half> %op)
				ret half %res
				}

				; Don't use SVE for 64-bit f32 vectors.
				define float @fmaxv_v2f32(<2 x float> %a) #0 {
				; CHECK-LABEL: fmaxv_v2f32:
				; CHECK: fmaxnmp s0, v0.2s
				; CHECK: ret
				%res = call float @llvm.experimental.vector.reduce.fmax.v2f32(<2 x float> %a)
				ret float %res
				}

				; Don't use SVE for 128-bit f32 vectors.
				define float @fmaxv_v4f32(<4 x float> %a) #0 {
				; CHECK-LABEL: fmaxv_v4f32:
				; CHECK: fmaxnmv s0, v0.4s
				; CHECK: ret
				%res = call float @llvm.experimental.vector.reduce.fmax.v4f32(<4 x float> %a)
				ret float %res
				}

				define float @fmaxv_v8f32(<8 x float>* %a) #0 {
				; CHECK-LABEL: fmaxv_v8f32:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].s, vl8
				; VBITS_GE_256-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_256-NEXT: fmaxnmv s0, [[PG]], [[OP]].s
				; VBITS_GE_256-NEXT: ret
				%op = load <8 x float>, <8 x float>* %a
				%res = call float @llvm.experimental.vector.reduce.fmax.v8f32(<8 x float> %op)
				ret float %res
				}

				define float @fmaxv_v16f32(<16 x float>* %a) #0 {
				; CHECK-LABEL: fmaxv_v16f32:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].s, vl16
				; VBITS_GE_512-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_512-NEXT: fmaxnmv s0, [[PG]], [[OP]].s
				; VBITS_GE_512-NEXT: ret
				%op = load <16 x float>, <16 x float>* %a
				%res = call float @llvm.experimental.vector.reduce.fmax.v16f32(<16 x float> %op)
				ret float %res
				}

				define float @fmaxv_v32f32(<32 x float>* %a) #0 {
				; CHECK-LABEL: fmaxv_v32f32:
				; VBITS_GE_1048: ptrue [[PG:p[0-9]+]].s, vl32
				paulwalker-armUnsubmitted Done Reply Inline Actions 1024 paulwalker-arm: 1024
				; VBITS_GE_1048-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_1048-NEXT: fmaxnmv s0, [[PG]], [[OP]].s
				; VBITS_GE_1048-NEXT: ret
				%op = load <32 x float>, <32 x float>* %a
				%res = call float @llvm.experimental.vector.reduce.fmax.v32f32(<32 x float> %op)
				ret float %res
				}

				define float @fmaxv_v64f32(<64 x float>* %a) #0 {
				; CHECK-LABEL: fmaxv_v64f32:
				; VBITS_GE_2096: ptrue [[PG:p[0-9]+]].s, vl64
				paulwalker-armUnsubmitted Not Done Reply Inline Actions 2048 paulwalker-arm: 2048
				; VBITS_GE_2096-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_2096-NEXT: fmaxnmv s0, [[PG]], [[OP]].s
				; VBITS_GE_2096-NEXT: ret
				%op = load <64 x float>, <64 x float>* %a
				%res = call float @llvm.experimental.vector.reduce.fmax.v64f32(<64 x float> %op)
				ret float %res
				}

				; Nothing to do for single element vectors.
				define double @fmaxv_v1f64(<1 x double> %a) #0 {
				; CHECK-LABEL: fmaxv_v1f64:
				; CHECK-NOT: fmax
				; CHECK: ret
				%res = call double @llvm.experimental.vector.reduce.fmax.v1f64(<1 x double> %a)
				ret double %res
				}

				; Don't use SVE for 128-bit f64 vectors.
				define double @fmaxv_v2f64(<2 x double> %a) #0 {
				; CHECK-LABEL: fmaxv_v2f64:
				; CHECK: fmaxnmp d0, v0.2d
				; CHECK-NEXT: ret
				%res = call double @llvm.experimental.vector.reduce.fmax.v2f64(<2 x double> %a)
				ret double %res
				}

				define double @fmaxv_v4f64(<4 x double>* %a) #0 {
				; CHECK-LABEL: fmaxv_v4f64:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].d, vl4
				; VBITS_GE_256-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_256-NEXT: fmaxnmv d0, [[PG]], [[OP]].d
				; VBITS_GE_256-NEXT: ret
				%op = load <4 x double>, <4 x double>* %a
				%res = call double @llvm.experimental.vector.reduce.fmax.v4f64(<4 x double> %op)
				ret double %res
				}

				define double @fmaxv_v8f64(<8 x double>* %a) #0 {
				; CHECK-LABEL: fmaxv_v8f64:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].d, vl8
				; VBITS_GE_512-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_512-NEXT: fmaxnmv d0, [[PG]], [[OP]].d
				; VBITS_GE_512-NEXT: ret
				%op = load <8 x double>, <8 x double>* %a
				%res = call double @llvm.experimental.vector.reduce.fmax.v8f64(<8 x double> %op)
				ret double %res
				}

				define double @fmaxv_v16f64(<16 x double>* %a) #0 {
				; CHECK-LABEL: fmaxv_v16f64:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].d, vl16
				; VBITS_GE_1024-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_1024-NEXT: fmaxnmv d0, [[PG]], [[OP]].d
				; VBITS_GE_1024-NEXT: ret
				%op = load <16 x double>, <16 x double>* %a
				%res = call double @llvm.experimental.vector.reduce.fmax.v16f64(<16 x double> %op)
				ret double %res
				paulwalker-armUnsubmitted Done Reply Inline Actions What's going on here? paulwalker-arm: What's going on here?
				cameron.mcinallyAuthorUnsubmitted Done Reply Inline Actions Fatigue error. Botched the CHECK line copy-and-paste and missed it. Sorry about that. cameron.mcinally: Fatigue error. Botched the CHECK line copy-and-paste and missed it. Sorry about that.
				cameron.mcinallyAuthorUnsubmitted Done Reply Inline Actions Pffff. Looks like this copy-and-paste problem has history. Correcting other tests now... cameron.mcinally: Pffff. Looks like this copy-and-paste problem has history. Correcting other tests now...
				cameron.mcinallyAuthorUnsubmitted Done Reply Inline Actions Fixed with 01c95f79424d. cameron.mcinally: Fixed with 01c95f79424d.
				}

				define double @fmaxv_v32f64(<32 x double>* %a) #0 {
				; CHECK-LABEL: fmaxv_v32f64:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].d, vl32
				; VBITS_GE_2048-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_2048-NEXT: fmaxnmv d0, [[PG]], [[OP]].d
				; VBITS_GE_2048-NEXT: ret
				%op = load <32 x double>, <32 x double>* %a
				%res = call double @llvm.experimental.vector.reduce.fmax.v32f64(<32 x double> %op)
				ret double %res
				}
				paulwalker-armUnsubmitted Done Reply Inline Actions And here? paulwalker-arm: And here?

				;
				; FMINV
				;

				; No NEON 16-bit vector FMINNMV support. Use SVE.
				define half @fminv_v4f16(<4 x half> %a) #0 {
				; CHECK-LABEL: fminv_v4f16:
				; CHECK: fminnmv h0, v0.4h
				; CHECK-NEXT: ret
				%res = call half @llvm.experimental.vector.reduce.fmin.v4f16(<4 x half> %a)
				ret half %res
				}

				; No NEON 16-bit vector FMINNMV support. Use SVE.
				define half @fminv_v8f16(<8 x half> %a) #0 {
				; CHECK-LABEL: fminv_v8f16:
				; CHECK: fminnmv h0, v0.8h
				; CHECK-NEXT: ret
				%res = call half @llvm.experimental.vector.reduce.fmin.v8f16(<8 x half> %a)
				ret half %res
				}

				define half @fminv_v16f16(<16 x half>* %a) #0 {
				; CHECK-LABEL: fminv_v16f16:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].h, vl16
				; VBITS_GE_256-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_256-NEXT: fminnmv h0, [[PG]], [[OP]].h
				; VBITS_GE_256-NEXT: ret
				%op = load <16 x half>, <16 x half>* %a
				%res = call half @llvm.experimental.vector.reduce.fmin.v16f16(<16 x half> %op)
				ret half %res
				}

				define half @fminv_v32f16(<32 x half>* %a) #0 {
				; CHECK-LABEL: fminv_v32f16:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].h, vl32
				; VBITS_GE_512-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_512-NEXT: fminnmv h0, [[PG]], [[OP]].h
				; VBITS_GE_512-NEXT: ret
				%op = load <32 x half>, <32 x half>* %a
				%res = call half @llvm.experimental.vector.reduce.fmin.v32f16(<32 x half> %op)
				ret half %res
				}

				define half @fminv_v64f16(<64 x half>* %a) #0 {
				; CHECK-LABEL: fminv_v64f16:
				; VBITS_GE_1048: ptrue [[PG:p[0-9]+]].h, vl64
				paulwalker-armUnsubmitted Done Reply Inline Actions 1024 paulwalker-arm: 1024
				; VBITS_GE_1048-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_1048-NEXT: fminnmv h0, [[PG]], [[OP]].h
				; VBITS_GE_1048-NEXT: ret
				%op = load <64 x half>, <64 x half>* %a
				%res = call half @llvm.experimental.vector.reduce.fmin.v64f16(<64 x half> %op)
				ret half %res
				}

				define half @fminv_v128f16(<128 x half>* %a) #0 {
				; CHECK-LABEL: fminv_v128f16:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].h, vl128
				; VBITS_GE_2048-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_2048-NEXT: fminnmv h0, [[PG]], [[OP]].h
				; VBITS_GE_2048-NEXT: ret
				%op = load <128 x half>, <128 x half>* %a
				%res = call half @llvm.experimental.vector.reduce.fmin.v128f16(<128 x half> %op)
				ret half %res
				}

				; Don't use SVE for 64-bit f32 vectors.
				define float @fminv_v2f32(<2 x float> %a) #0 {
				; CHECK-LABEL: fminv_v2f32:
				; CHECK: fminnmp s0, v0.2s
				; CHECK: ret
				%res = call float @llvm.experimental.vector.reduce.fmin.v2f32(<2 x float> %a)
				ret float %res
				}

				; Don't use SVE for 128-bit f32 vectors.
				define float @fminv_v4f32(<4 x float> %a) #0 {
				; CHECK-LABEL: fminv_v4f32:
				; CHECK: fminnmv s0, v0.4s
				; CHECK: ret
				%res = call float @llvm.experimental.vector.reduce.fmin.v4f32(<4 x float> %a)
				ret float %res
				}

				define float @fminv_v8f32(<8 x float>* %a) #0 {
				; CHECK-LABEL: fminv_v8f32:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].s, vl8
				; VBITS_GE_256-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_256-NEXT: fminnmv s0, [[PG]], [[OP]].s
				; VBITS_GE_256-NEXT: ret
				%op = load <8 x float>, <8 x float>* %a
				%res = call float @llvm.experimental.vector.reduce.fmin.v8f32(<8 x float> %op)
				ret float %res
				}

				define float @fminv_v16f32(<16 x float>* %a) #0 {
				; CHECK-LABEL: fminv_v16f32:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].s, vl16
				; VBITS_GE_512-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_512-NEXT: fminnmv s0, [[PG]], [[OP]].s
				; VBITS_GE_512-NEXT: ret
				%op = load <16 x float>, <16 x float>* %a
				%res = call float @llvm.experimental.vector.reduce.fmin.v16f32(<16 x float> %op)
				ret float %res
				}

				define float @fminv_v32f32(<32 x float>* %a) #0 {
				; CHECK-LABEL: fminv_v32f32:
				; VBITS_GE_1048: ptrue [[PG:p[0-9]+]].s, vl32
				; VBITS_GE_1048-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				paulwalker-armUnsubmitted Done Reply Inline Actions 1024 paulwalker-arm: 1024
				; VBITS_GE_1048-NEXT: fminnmv s0, [[PG]], [[OP]].s
				; VBITS_GE_1048-NEXT: ret
				%op = load <32 x float>, <32 x float>* %a
				%res = call float @llvm.experimental.vector.reduce.fmin.v32f32(<32 x float> %op)
				ret float %res
				}

				define float @fminv_v64f32(<64 x float>* %a) #0 {
				; CHECK-LABEL: fminv_v64f32:
				; VBITS_GE_2096: ptrue [[PG:p[0-9]+]].s, vl64
				; VBITS_GE_2096-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				paulwalker-armUnsubmitted Done Reply Inline Actions 2048 paulwalker-arm: 2048
				; VBITS_GE_2096-NEXT: fminnmv s0, [[PG]], [[OP]].s
				; VBITS_GE_2096-NEXT: ret
				%op = load <64 x float>, <64 x float>* %a
				%res = call float @llvm.experimental.vector.reduce.fmin.v64f32(<64 x float> %op)
				ret float %res
				}

				; Nothing to do for single element vectors.
				define double @fminv_v1f64(<1 x double> %a) #0 {
				; CHECK-LABEL: fminv_v1f64:
				; CHECK-NOT: fmin
				; CHECK: ret
				%res = call double @llvm.experimental.vector.reduce.fmin.v1f64(<1 x double> %a)
				ret double %res
				}

				; Don't use SVE for 128-bit f64 vectors.
				define double @fminv_v2f64(<2 x double> %a) #0 {
				; CHECK-LABEL: fminv_v2f64:
				; CHECK: fminnmp d0, v0.2d
				; CHECK-NEXT: ret
				%res = call double @llvm.experimental.vector.reduce.fmin.v2f64(<2 x double> %a)
				ret double %res
				}

				define double @fminv_v4f64(<4 x double>* %a) #0 {
				; CHECK-LABEL: fminv_v4f64:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].d, vl4
				; VBITS_GE_256-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_256-NEXT: fminnmv d0, [[PG]], [[OP]].d
				; VBITS_GE_256-NEXT: ret
				%op = load <4 x double>, <4 x double>* %a
				%res = call double @llvm.experimental.vector.reduce.fmin.v4f64(<4 x double> %op)
				ret double %res
				}

				define double @fminv_v8f64(<8 x double>* %a) #0 {
				; CHECK-LABEL: fminv_v8f64:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].d, vl8
				; VBITS_GE_512-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_512-NEXT: fminnmv d0, [[PG]], [[OP]].d
				; VBITS_GE_512-NEXT: ret
				%op = load <8 x double>, <8 x double>* %a
				%res = call double @llvm.experimental.vector.reduce.fmin.v8f64(<8 x double> %op)
				ret double %res
				}

				define double @fminv_v16f64(<16 x double>* %a) #0 {
				; CHECK-LABEL: fminv_v16f64:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].d, vl16
				; VBITS_GE_1024-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_1024-NEXT: fminnmv d0, [[PG]], [[OP]].d
				; VBITS_GE_1024-NEXT: ret
				%op = load <16 x double>, <16 x double>* %a
				%res = call double @llvm.experimental.vector.reduce.fmin.v16f64(<16 x double> %op)
				ret double %res
				}

				define double @fminv_v32f64(<32 x double>* %a) #0 {
				; CHECK-LABEL: fminv_v32f64:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].d, vl32
				; VBITS_GE_2048-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_2048-NEXT: fminnmv d0, [[PG]], [[OP]].d
				; VBITS_GE_2048-NEXT: ret
				%op = load <32 x double>, <32 x double>* %a
				%res = call double @llvm.experimental.vector.reduce.fmin.v32f64(<32 x double> %op)
				ret double %res
				}

				attributes #0 = { "target-features"="+sve" }

				declare half @llvm.experimental.vector.reduce.fmax.v4f16(<4 x half>)
				declare half @llvm.experimental.vector.reduce.fmax.v8f16(<8 x half>)
				declare half @llvm.experimental.vector.reduce.fmax.v16f16(<16 x half>)
				declare half @llvm.experimental.vector.reduce.fmax.v32f16(<32 x half>)
				declare half @llvm.experimental.vector.reduce.fmax.v64f16(<64 x half>)
				declare half @llvm.experimental.vector.reduce.fmax.v128f16(<128 x half>)

				declare float @llvm.experimental.vector.reduce.fmax.v2f32(<2 x float>)
				declare float @llvm.experimental.vector.reduce.fmax.v4f32(<4 x float>)
				declare float @llvm.experimental.vector.reduce.fmax.v8f32(<8 x float>)
				declare float @llvm.experimental.vector.reduce.fmax.v16f32(<16 x float>)
				declare float @llvm.experimental.vector.reduce.fmax.v32f32(<32 x float>)
				declare float @llvm.experimental.vector.reduce.fmax.v64f32(<64 x float>)

				declare double @llvm.experimental.vector.reduce.fmax.v1f64(<1 x double>)
				declare double @llvm.experimental.vector.reduce.fmax.v2f64(<2 x double>)
				declare double @llvm.experimental.vector.reduce.fmax.v4f64(<4 x double>)
				declare double @llvm.experimental.vector.reduce.fmax.v8f64(<8 x double>)
				declare double @llvm.experimental.vector.reduce.fmax.v16f64(<16 x double>)
				declare double @llvm.experimental.vector.reduce.fmax.v32f64(<32 x double>)

				declare half @llvm.experimental.vector.reduce.fmin.v4f16(<4 x half>)
				declare half @llvm.experimental.vector.reduce.fmin.v8f16(<8 x half>)
				declare half @llvm.experimental.vector.reduce.fmin.v16f16(<16 x half>)
				declare half @llvm.experimental.vector.reduce.fmin.v32f16(<32 x half>)
				declare half @llvm.experimental.vector.reduce.fmin.v64f16(<64 x half>)
				declare half @llvm.experimental.vector.reduce.fmin.v128f16(<128 x half>)

				declare float @llvm.experimental.vector.reduce.fmin.v2f32(<2 x float>)
				declare float @llvm.experimental.vector.reduce.fmin.v4f32(<4 x float>)
				declare float @llvm.experimental.vector.reduce.fmin.v8f32(<8 x float>)
				declare float @llvm.experimental.vector.reduce.fmin.v16f32(<16 x float>)
				declare float @llvm.experimental.vector.reduce.fmin.v32f32(<32 x float>)
				declare float @llvm.experimental.vector.reduce.fmin.v64f32(<64 x float>)

				declare double @llvm.experimental.vector.reduce.fmin.v1f64(<1 x double>)
				declare double @llvm.experimental.vector.reduce.fmin.v2f64(<2 x double>)
				declare double @llvm.experimental.vector.reduce.fmin.v4f64(<4 x double>)
				declare double @llvm.experimental.vector.reduce.fmin.v8f64(<8 x double>)
				declare double @llvm.experimental.vector.reduce.fmin.v16f64(<16 x double>)
				declare double @llvm.experimental.vector.reduce.fmin.v32f64(<32 x double>)

This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Lower fixed length VECREDUCE_[FMAX|FMIN] to ScalableClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 294987

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll

[SVE] Lower fixed length VECREDUCE_[FMAX|FMIN] to Scalable
ClosedPublic