This is an archive of the discontinued LLVM Phabricator instance.

llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll
3	Hi, instead of having two RUN lines here would it be better to just use function attributes, i.e. for test_v4f16 have a function attribute that adds "fullfp16" support? That way you only need one RUN line and can avoid all the additional FP16 check lines too, since most of them seem to be the same as the first RUN line.
llvm/test/CodeGen/AArch64/vecreduce-fmin-legalization.ll
68	It looks like you've added FP16 checks here without the additional RUN line. Similar to my comment in the fmax case perhaps you can just use function attributes instead?

fhahn added a subscriber: fhahn.Mar 3 2021, 1:32 AM

fhahn added inline comments.

llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll
3	You could also use multiple check prefixes to avoid having repeated check for both RUN lines, if they are equal, e.g. `FileCheck %s --check-prefix=CHECK --check-prefix=NOFP16 ...`, `FileCheck %s --check-prefix=CHECK --check-prefix=FP16 ...` With that, we should only need separate `FP16`/`NOFP16` check lines for functions where there is a difference.

LemonBoy added inline comments.Mar 3 2021, 1:36 AM

llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll
3	The main idea is to check the expanded reduction is emitted when `fullfp16` is not specified, I guess the `FP16` check can be removed altogether if we don't care about checking the case where the intrinsic can be lowered.
llvm/test/CodeGen/AArch64/vecreduce-fmin-legalization.ll
68	Good catch, I keep forgetting pieces when I `git add`.

david-arm added inline comments.Mar 3 2021, 1:38 AM

llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll
3	Hi @fhahn sure that would work too and it's a good point - I was just thinking that fewer RUN lines meant faster test suites that's all.

nikic added a subscriber: nikic.Mar 3 2021, 1:39 AM

nikic added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10488	Might be better to not mark them Custom in the first place? https://github.com/llvm/llvm-project/blob/3b47bd32f9df4a57db98db5f35e680c7bd9fde3e/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L1019-L1023

fhahn added inline comments.Mar 3 2021, 1:49 AM

llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll
3	I was just thinking that fewer RUN lines meant faster test suites that's all. That's also a good point. I'm not sure how things will shake out with more tests for `half`; I think we should have fp16 tests both with and without `fullfp16`, as long we have that and can avoid must of the redundant check lines, I am happy whichever way we go :)
67	can you also throw in a test with vectors that are not directly legal, regardless of `fullfp16`?

Don't mark the fmin/fmax ops as legal if fullfp16 is not available.
Update test cases.

LemonBoy marked 4 inline comments as done.Mar 3 2021, 3:11 AM

LemonBoy added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10488	Indeed, fixed. Thanks for the suggestion!
llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll
67	That's hitting a different problem in the legalization step, I'll try to address that in a separate patch.

Harbormaster completed remote builds in B91759: Diff 327707.Mar 3 2021, 4:34 AM

Harbormaster completed remote builds in B91780: Diff 327735.Mar 3 2021, 7:19 AM

LemonBoy mentioned this in D97859: [LegalizeDAG] Implement promotion rules for SELECT_CC.Mar 3 2021, 7:25 AM

LemonBoy added a child revision: D97859: [LegalizeDAG] Implement promotion rules for SELECT_CC.

LGTM.

This revision is now accepted and ready to land.Mar 3 2021, 2:58 PM

Closed by commit rG8725b24c6d4a: [AArch64] Legalize horizontal fmax/fmin reductions on f16 vectors (authored by LemonBoy). · Explain WhyMar 5 2021, 7:09 AM

This revision was automatically updated to reflect the committed changes.

LemonBoy added a commit: rG8725b24c6d4a: [AArch64] Legalize horizontal fmax/fmin reductions on f16 vectors.

LemonBoy mentioned this in rG2ec43e416734: [LegalizeDAG] Implement promotion rules for SELECT_CC.Mar 5 2021, 9:23 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

8 lines

test/

CodeGen/

AArch64/

vecreduce-fmax-legalization.ll

106 lines

vecreduce-fmin-legalization.ll

57 lines

Diff 327707

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,477 Lines • ▼ Show 20 Lines	case ISD::VECREDUCE_SMAX:
return getReductionSDNode(AArch64ISD::SMAXV, dl, Op, DAG);		return getReductionSDNode(AArch64ISD::SMAXV, dl, Op, DAG);
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
return getReductionSDNode(AArch64ISD::SMINV, dl, Op, DAG);		return getReductionSDNode(AArch64ISD::SMINV, dl, Op, DAG);
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
return getReductionSDNode(AArch64ISD::UMAXV, dl, Op, DAG);		return getReductionSDNode(AArch64ISD::UMAXV, dl, Op, DAG);
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
return getReductionSDNode(AArch64ISD::UMINV, dl, Op, DAG);		return getReductionSDNode(AArch64ISD::UMINV, dl, Op, DAG);
case ISD::VECREDUCE_FMAX: {		case ISD::VECREDUCE_FMAX: {
		// Expand the reduction if the CPU cannot handle it.
		if (SrcVT.getVectorElementType() == MVT::f16 && !Subtarget->hasFullFP16())
		return SDValue();
		nikicUnsubmitted Not Done Reply Inline Actions Might be better to not mark them Custom in the first place? https://github.com/llvm/llvm-project/blob/3b47bd32f9df4a57db98db5f35e680c7bd9fde3e/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L1019-L1023 nikic: Might be better to not mark them Custom in the first place? https://github.com/llvm/llvm…
		LemonBoyAuthorUnsubmitted Done Reply Inline Actions Indeed, fixed. Thanks for the suggestion! LemonBoy: Indeed, fixed. Thanks for the suggestion!

return DAG.getNode(		return DAG.getNode(
ISD::INTRINSIC_WO_CHAIN, dl, Op.getValueType(),		ISD::INTRINSIC_WO_CHAIN, dl, Op.getValueType(),
DAG.getConstant(Intrinsic::aarch64_neon_fmaxnmv, dl, MVT::i32),		DAG.getConstant(Intrinsic::aarch64_neon_fmaxnmv, dl, MVT::i32),
Src);		Src);
}		}
case ISD::VECREDUCE_FMIN: {		case ISD::VECREDUCE_FMIN: {
		// Expand the reduction if the CPU cannot handle it.
		if (SrcVT.getVectorElementType() == MVT::f16 && !Subtarget->hasFullFP16())
		return SDValue();

return DAG.getNode(		return DAG.getNode(
ISD::INTRINSIC_WO_CHAIN, dl, Op.getValueType(),		ISD::INTRINSIC_WO_CHAIN, dl, Op.getValueType(),
DAG.getConstant(Intrinsic::aarch64_neon_fminnmv, dl, MVT::i32),		DAG.getConstant(Intrinsic::aarch64_neon_fminnmv, dl, MVT::i32),
Src);		Src);
}		}
default:		default:
llvm_unreachable("Unhandled reduction");		llvm_unreachable("Unhandled reduction");
}		}
▲ Show 20 Lines • Show All 6,828 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s --check-prefix=CHECK			; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s --check-prefix=CHECK
				; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon,+fullfp16 \| FileCheck %s --check-prefix=FP16
				david-armUnsubmitted Done Reply Inline Actions Hi, instead of having two RUN lines here would it be better to just use function attributes, i.e. for test_v4f16 have a function attribute that adds "fullfp16" support? That way you only need one RUN line and can avoid all the additional FP16 check lines too, since most of them seem to be the same as the first RUN line. david-arm: Hi, instead of having two RUN lines here would it be better to just use function attributes, i.
				fhahnUnsubmitted Done Reply Inline Actions You could also use multiple check prefixes to avoid having repeated check for both RUN lines, if they are equal, e.g. `FileCheck %s --check-prefix=CHECK --check-prefix=NOFP16 ...`, `FileCheck %s --check-prefix=CHECK --check-prefix=FP16 ...` With that, we should only need separate `FP16`/`NOFP16` check lines for functions where there is a difference. fhahn: You could also use multiple check prefixes to avoid having repeated check for both RUN lines…
				david-armUnsubmitted Done Reply Inline Actions Hi @fhahn sure that would work too and it's a good point - I was just thinking that fewer RUN lines meant faster test suites that's all. david-arm: Hi @fhahn sure that would work too and it's a good point - I was just thinking that fewer RUN…
				fhahnUnsubmitted Done Reply Inline Actions I was just thinking that fewer RUN lines meant faster test suites that's all. That's also a good point. I'm not sure how things will shake out with more tests for `half`; I think we should have fp16 tests both with and without `fullfp16`, as long we have that and can avoid must of the redundant check lines, I am happy whichever way we go :) fhahn: > I was just thinking that fewer RUN lines meant faster test suites that's all. That's also a…
				LemonBoyAuthorUnsubmitted Done Reply Inline Actions The main idea is to check the expanded reduction is emitted when `fullfp16` is not specified, I guess the `FP16` check can be removed altogether if we don't care about checking the case where the intrinsic can be lowered. LemonBoy: The main idea is to check the expanded reduction is emitted when `fullfp16` is not specified, I…

	declare half @llvm.vector.reduce.fmax.v1f16(<1 x half> %a)			declare half @llvm.vector.reduce.fmax.v1f16(<1 x half> %a)
	declare float @llvm.vector.reduce.fmax.v1f32(<1 x float> %a)			declare float @llvm.vector.reduce.fmax.v1f32(<1 x float> %a)
	declare double @llvm.vector.reduce.fmax.v1f64(<1 x double> %a)			declare double @llvm.vector.reduce.fmax.v1f64(<1 x double> %a)
	declare fp128 @llvm.vector.reduce.fmax.v1f128(<1 x fp128> %a)			declare fp128 @llvm.vector.reduce.fmax.v1f128(<1 x fp128> %a)

				declare half @llvm.vector.reduce.fmax.v4f16(<4 x half> %a)
	declare float @llvm.vector.reduce.fmax.v3f32(<3 x float> %a)			declare float @llvm.vector.reduce.fmax.v3f32(<3 x float> %a)
	declare fp128 @llvm.vector.reduce.fmax.v2f128(<2 x fp128> %a)			declare fp128 @llvm.vector.reduce.fmax.v2f128(<2 x fp128> %a)
	declare float @llvm.vector.reduce.fmax.v16f32(<16 x float> %a)			declare float @llvm.vector.reduce.fmax.v16f32(<16 x float> %a)

	define half @test_v1f16(<1 x half> %a) nounwind {			define half @test_v1f16(<1 x half> %a) nounwind {
	; CHECK-LABEL: test_v1f16:			; CHECK-LABEL: test_v1f16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
				;
				; FP16-LABEL: test_v1f16:
				; FP16: // %bb.0:
				; FP16-NEXT: ret
	%b = call nnan half @llvm.vector.reduce.fmax.v1f16(<1 x half> %a)			%b = call nnan half @llvm.vector.reduce.fmax.v1f16(<1 x half> %a)
	ret half %b			ret half %b
	}			}

	define float @test_v1f32(<1 x float> %a) nounwind {			define float @test_v1f32(<1 x float> %a) nounwind {
	; CHECK-LABEL: test_v1f32:			; CHECK-LABEL: test_v1f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0			; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
	; CHECK-NEXT: // kill: def $s0 killed $s0 killed $q0			; CHECK-NEXT: // kill: def $s0 killed $s0 killed $q0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
				;
				; FP16-LABEL: test_v1f32:
				; FP16: // %bb.0:
				; FP16-NEXT: // kill: def $d0 killed $d0 def $q0
				; FP16-NEXT: // kill: def $s0 killed $s0 killed $q0
				; FP16-NEXT: ret
	%b = call nnan float @llvm.vector.reduce.fmax.v1f32(<1 x float> %a)			%b = call nnan float @llvm.vector.reduce.fmax.v1f32(<1 x float> %a)
	ret float %b			ret float %b
	}			}

	define double @test_v1f64(<1 x double> %a) nounwind {			define double @test_v1f64(<1 x double> %a) nounwind {
	; CHECK-LABEL: test_v1f64:			; CHECK-LABEL: test_v1f64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
				;
				; FP16-LABEL: test_v1f64:
				; FP16: // %bb.0:
				; FP16-NEXT: ret
	%b = call nnan double @llvm.vector.reduce.fmax.v1f64(<1 x double> %a)			%b = call nnan double @llvm.vector.reduce.fmax.v1f64(<1 x double> %a)
	ret double %b			ret double %b
	}			}

	define fp128 @test_v1f128(<1 x fp128> %a) nounwind {			define fp128 @test_v1f128(<1 x fp128> %a) nounwind {
	; CHECK-LABEL: test_v1f128:			; CHECK-LABEL: test_v1f128:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
				;
				; FP16-LABEL: test_v1f128:
				; FP16: // %bb.0:
				; FP16-NEXT: ret
	%b = call nnan fp128 @llvm.vector.reduce.fmax.v1f128(<1 x fp128> %a)			%b = call nnan fp128 @llvm.vector.reduce.fmax.v1f128(<1 x fp128> %a)
	ret fp128 %b			ret fp128 %b
	}			}

				define half @test_v4f16(<4 x half> %a) nounwind {
				fhahnUnsubmitted Not Done Reply Inline Actions can you also throw in a test with vectors that are not directly legal, regardless of `fullfp16`? fhahn: can you also throw in a test with vectors that are not directly legal, regardless of `fullfp16`?
				LemonBoyAuthorUnsubmitted Done Reply Inline Actions That's hitting a different problem in the legalization step, I'll try to address that in a separate patch. LemonBoy: That's hitting a different problem in the legalization step, I'll try to address that in a…
				; CHECK-LABEL: test_v4f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-NEXT: mov h3, v0.h[1]
				; CHECK-NEXT: mov h1, v0.h[3]
				; CHECK-NEXT: mov h2, v0.h[2]
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fcvt s3, h3
				; CHECK-NEXT: fmaxnm s0, s0, s3
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: fcvt s2, h2
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fmaxnm s0, s0, s2
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fcvt s1, h1
				; CHECK-NEXT: fmaxnm s0, s0, s1
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: ret
				;
				; FP16-LABEL: test_v4f16:
				; FP16: // %bb.0:
				; FP16-NEXT: fmaxnmv h0, v0.4h
				; FP16-NEXT: ret
				%b = call nnan half @llvm.vector.reduce.fmax.v4f16(<4 x half> %a)
				ret half %b
				}

				define half @test_v4f16_ninf(<4 x half> %a) nounwind {
				; CHECK-LABEL: test_v4f16_ninf:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-NEXT: mov h3, v0.h[1]
				; CHECK-NEXT: mov h1, v0.h[3]
				; CHECK-NEXT: mov h2, v0.h[2]
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fcvt s3, h3
				; CHECK-NEXT: fmaxnm s0, s0, s3
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: fcvt s2, h2
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fmaxnm s0, s0, s2
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fcvt s1, h1
				; CHECK-NEXT: fmaxnm s0, s0, s1
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: ret
				;
				; FP16-LABEL: test_v4f16_ninf:
				; FP16: // %bb.0:
				; FP16-NEXT: fmaxnmv h0, v0.4h
				; FP16-NEXT: ret
				%b = call nnan ninf half @llvm.vector.reduce.fmax.v4f16(<4 x half> %a)
				ret half %b
				}

	define float @test_v3f32(<3 x float> %a) nounwind {			define float @test_v3f32(<3 x float> %a) nounwind {
	; CHECK-LABEL: test_v3f32:			; CHECK-LABEL: test_v3f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov w8, #-8388608			; CHECK-NEXT: mov w8, #-8388608
	; CHECK-NEXT: fmov s1, w8			; CHECK-NEXT: fmov s1, w8
	; CHECK-NEXT: mov v0.s[3], v1.s[0]			; CHECK-NEXT: mov v0.s[3], v1.s[0]
	; CHECK-NEXT: fmaxnmv s0, v0.4s			; CHECK-NEXT: fmaxnmv s0, v0.4s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
				;
				; FP16-LABEL: test_v3f32:
				; FP16: // %bb.0:
				; FP16-NEXT: mov w8, #-8388608
				; FP16-NEXT: fmov s1, w8
				; FP16-NEXT: mov v0.s[3], v1.s[0]
				; FP16-NEXT: fmaxnmv s0, v0.4s
				; FP16-NEXT: ret
	%b = call nnan float @llvm.vector.reduce.fmax.v3f32(<3 x float> %a)			%b = call nnan float @llvm.vector.reduce.fmax.v3f32(<3 x float> %a)
	ret float %b			ret float %b
	}			}

	define float @test_v3f32_ninf(<3 x float> %a) nounwind {			define float @test_v3f32_ninf(<3 x float> %a) nounwind {
	; CHECK-LABEL: test_v3f32_ninf:			; CHECK-LABEL: test_v3f32_ninf:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov w8, #-8388609			; CHECK-NEXT: mov w8, #-8388609
	; CHECK-NEXT: fmov s1, w8			; CHECK-NEXT: fmov s1, w8
	; CHECK-NEXT: mov v0.s[3], v1.s[0]			; CHECK-NEXT: mov v0.s[3], v1.s[0]
	; CHECK-NEXT: fmaxnmv s0, v0.4s			; CHECK-NEXT: fmaxnmv s0, v0.4s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
				;
				; FP16-LABEL: test_v3f32_ninf:
				; FP16: // %bb.0:
				; FP16-NEXT: mov w8, #-8388609
				; FP16-NEXT: fmov s1, w8
				; FP16-NEXT: mov v0.s[3], v1.s[0]
				; FP16-NEXT: fmaxnmv s0, v0.4s
				; FP16-NEXT: ret
	%b = call nnan ninf float @llvm.vector.reduce.fmax.v3f32(<3 x float> %a)			%b = call nnan ninf float @llvm.vector.reduce.fmax.v3f32(<3 x float> %a)
	ret float %b			ret float %b
	}			}

	define fp128 @test_v2f128(<2 x fp128> %a) nounwind {			define fp128 @test_v2f128(<2 x fp128> %a) nounwind {
	; CHECK-LABEL: test_v2f128:			; CHECK-LABEL: test_v2f128:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: b fmaxl			; CHECK-NEXT: b fmaxl
				;
				; FP16-LABEL: test_v2f128:
				; FP16: // %bb.0:
				; FP16-NEXT: b fmaxl
	%b = call nnan fp128 @llvm.vector.reduce.fmax.v2f128(<2 x fp128> %a)			%b = call nnan fp128 @llvm.vector.reduce.fmax.v2f128(<2 x fp128> %a)
	ret fp128 %b			ret fp128 %b
	}			}

	define float @test_v16f32(<16 x float> %a) nounwind {			define float @test_v16f32(<16 x float> %a) nounwind {
	; CHECK-LABEL: test_v16f32:			; CHECK-LABEL: test_v16f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmaxnm v1.4s, v1.4s, v3.4s			; CHECK-NEXT: fmaxnm v1.4s, v1.4s, v3.4s
	; CHECK-NEXT: fmaxnm v0.4s, v0.4s, v2.4s			; CHECK-NEXT: fmaxnm v0.4s, v0.4s, v2.4s
	; CHECK-NEXT: fmaxnm v0.4s, v0.4s, v1.4s			; CHECK-NEXT: fmaxnm v0.4s, v0.4s, v1.4s
	; CHECK-NEXT: fmaxnmv s0, v0.4s			; CHECK-NEXT: fmaxnmv s0, v0.4s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
				;
				; FP16-LABEL: test_v16f32:
				; FP16: // %bb.0:
				; FP16-NEXT: fmaxnm v1.4s, v1.4s, v3.4s
				; FP16-NEXT: fmaxnm v0.4s, v0.4s, v2.4s
				; FP16-NEXT: fmaxnm v0.4s, v0.4s, v1.4s
				; FP16-NEXT: fmaxnmv s0, v0.4s
				; FP16-NEXT: ret
	%b = call nnan float @llvm.vector.reduce.fmax.v16f32(<16 x float> %a)			%b = call nnan float @llvm.vector.reduce.fmax.v16f32(<16 x float> %a)
	ret float %b			ret float %b
	}			}

llvm/test/CodeGen/AArch64/vecreduce-fmin-legalization.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s --check-prefix=CHECK			; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s --check-prefix=CHECK

	declare half @llvm.vector.reduce.fmin.v1f16(<1 x half> %a)			declare half @llvm.vector.reduce.fmin.v1f16(<1 x half> %a)
	declare float @llvm.vector.reduce.fmin.v1f32(<1 x float> %a)			declare float @llvm.vector.reduce.fmin.v1f32(<1 x float> %a)
	declare double @llvm.vector.reduce.fmin.v1f64(<1 x double> %a)			declare double @llvm.vector.reduce.fmin.v1f64(<1 x double> %a)
	declare fp128 @llvm.vector.reduce.fmin.v1f128(<1 x fp128> %a)			declare fp128 @llvm.vector.reduce.fmin.v1f128(<1 x fp128> %a)

				declare half @llvm.vector.reduce.fmin.v4f16(<4 x half> %a)
	declare float @llvm.vector.reduce.fmin.v3f32(<3 x float> %a)			declare float @llvm.vector.reduce.fmin.v3f32(<3 x float> %a)
	declare fp128 @llvm.vector.reduce.fmin.v2f128(<2 x fp128> %a)			declare fp128 @llvm.vector.reduce.fmin.v2f128(<2 x fp128> %a)
	declare float @llvm.vector.reduce.fmin.v16f32(<16 x float> %a)			declare float @llvm.vector.reduce.fmin.v16f32(<16 x float> %a)

	define half @test_v1f16(<1 x half> %a) nounwind {			define half @test_v1f16(<1 x half> %a) nounwind {
	; CHECK-LABEL: test_v1f16:			; CHECK-LABEL: test_v1f16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	Show All 22 Lines
	define fp128 @test_v1f128(<1 x fp128> %a) nounwind {			define fp128 @test_v1f128(<1 x fp128> %a) nounwind {
	; CHECK-LABEL: test_v1f128:			; CHECK-LABEL: test_v1f128:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%b = call nnan fp128 @llvm.vector.reduce.fmin.v1f128(<1 x fp128> %a)			%b = call nnan fp128 @llvm.vector.reduce.fmin.v1f128(<1 x fp128> %a)
	ret fp128 %b			ret fp128 %b
	}			}

				define half @test_v4f16(<4 x half> %a) nounwind {
				; CHECK-LABEL: test_v4f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-NEXT: mov h3, v0.h[1]
				; CHECK-NEXT: mov h1, v0.h[3]
				; CHECK-NEXT: mov h2, v0.h[2]
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fcvt s3, h3
				; CHECK-NEXT: fminnm s0, s0, s3
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: fcvt s2, h2
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fminnm s0, s0, s2
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fcvt s1, h1
				; CHECK-NEXT: fminnm s0, s0, s1
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: ret
				; FP16-LABEL: test_v4f16:
				david-armUnsubmitted Not Done Reply Inline Actions It looks like you've added FP16 checks here without the additional RUN line. Similar to my comment in the fmax case perhaps you can just use function attributes instead? david-arm: It looks like you've added FP16 checks here without the additional RUN line. Similar to my…
				LemonBoyAuthorUnsubmitted Done Reply Inline Actions Good catch, I keep forgetting pieces when I `git add`. LemonBoy: Good catch, I keep forgetting pieces when I `git add`.
				; FP16: // %bb.0:
				; FP16-NEXT: fminnmv h0, v0.4h
				; FP16-NEXT: ret
				%b = call nnan half @llvm.vector.reduce.fmin.v4f16(<4 x half> %a)
				ret half %b
				}

				define half @test_v4f16_ninf(<4 x half> %a) nounwind {
				; CHECK-LABEL: test_v4f16_ninf:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-NEXT: mov h3, v0.h[1]
				; CHECK-NEXT: mov h1, v0.h[3]
				; CHECK-NEXT: mov h2, v0.h[2]
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fcvt s3, h3
				; CHECK-NEXT: fminnm s0, s0, s3
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: fcvt s2, h2
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fminnm s0, s0, s2
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: fcvt s0, h0
				; CHECK-NEXT: fcvt s1, h1
				; CHECK-NEXT: fminnm s0, s0, s1
				; CHECK-NEXT: fcvt h0, s0
				; CHECK-NEXT: ret
				; FP16-LABEL: test_v4f16_ninf:
				; FP16: // %bb.0:
				; FP16-NEXT: fminnmv h0, v0.4h
				; FP16-NEXT: ret
				%b = call nnan ninf half @llvm.vector.reduce.fmin.v4f16(<4 x half> %a)
				ret half %b
				}

	define float @test_v3f32(<3 x float> %a) nounwind {			define float @test_v3f32(<3 x float> %a) nounwind {
	; CHECK-LABEL: test_v3f32:			; CHECK-LABEL: test_v3f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov w8, #2139095040			; CHECK-NEXT: mov w8, #2139095040
	; CHECK-NEXT: fmov s1, w8			; CHECK-NEXT: fmov s1, w8
	; CHECK-NEXT: mov v0.s[3], v1.s[0]			; CHECK-NEXT: mov v0.s[3], v1.s[0]
	; CHECK-NEXT: fminnmv s0, v0.4s			; CHECK-NEXT: fminnmv s0, v0.4s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	Show All 35 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Legalize horizontal fmax/fmin reductions on f16 vectorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 327707

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll

llvm/test/CodeGen/AArch64/vecreduce-fmin-legalization.ll

[AArch64] Legalize horizontal fmax/fmin reductions on f16 vectors
ClosedPublic