This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
8/8
ARMISelLowering.cpp
-
test/CodeGen/Thumb2/
-
CodeGen/
-
Thumb2/
2/2
mve-pred-selectop3.ll

Differential D127275

[MVE] Fold fadd(select(..., +0.0)) into a predicated fadd
ClosedPublic

Authored by david-arm on Jun 8 2022, 1:33 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
dmgreen

Commits

rG007917b95ce2: [MVE] Fold fadd(select(..., +0.0)) into a predicated fadd

Summary

We already have patterns for matching fadd(select(..., -0.0)),
but an upcoming patch will lead to patterns using +0.0 as the
identity instead of -0.0. I'm adding support for these patterns
now to avoid any regressions for MVE.

Diff Detail

Event Timeline

david-arm created this revision.Jun 8 2022, 1:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2022, 1:33 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

david-arm requested review of this revision.Jun 8 2022, 1:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2022, 1:33 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm added a child revision: D126774: [InstCombine] Use +0.0 instead of -0.0 as the FP identity for some folds.Jun 8 2022, 1:36 AM

david-arm mentioned this in D126774: [InstCombine] Use +0.0 instead of -0.0 as the FP identity for some folds.

Harbormaster completed remote builds in B168496: Diff 435073.Jun 8 2022, 2:14 AM

If this is inverting the transform from D126774, do we need 'nsz' to avoid miscompiling -0.0?
https://alive2.llvm.org/ce/z/Z6sngi

In D127275#3566222, @spatel wrote:

If this is inverting the transform from D126774, do we need 'nsz' to avoid miscompiling -0.0?
https://alive2.llvm.org/ce/z/Z6sngi

I'm assuming that for MVE an fadd of +0 for a given lane is equivalent to simply not performing the fadd for that lane (i.e. by using predication). This is the same assumption that we were making for -0 case, i.e. that adding either -0 or +0 leaves the lane unchanged. Perhaps @dmgreen can confirm this?

But if necessary I can certainly add an additional check for nsz.

I'm assuming that for MVE an fadd of +0 for a given lane is equivalent to simply not performing the fadd for that lane (i.e. by using predication).

Yes, but I'm not sure how that matters. fadd -0.0, 0.0 should produce 0.0. The same should be true if the first argument is in a variable holding -0.0, and the second is from a select that could be 0.0 - it should give the output 0.0, not of the variable.

llvm/lib/Target/ARM/ARMISelLowering.cpp
16709	Can you update this comment. The Lambda name is no longer correct too.
16715	ImmVal == 0 would show the intent a little better.
llvm/test/CodeGen/Thumb2/mve-pred-selectop3.ll
375	Can you add nsz variants of the tests and make sure it is only those that get folded.

david-arm added inline comments.Jun 8 2022, 5:41 AM

llvm/test/CodeGen/Thumb2/mve-pred-selectop3.ll
375	I was a bit confused by your comment starting `Yes, but I'm not sure how that matters.` Do you agree with @spatel that we should only apply the fold for '+0.0' if the nsz flag is set on the select? Hopefully the FMF flags on the IR select have been propagated to the VSELECT DAG node.

Oh, sorry - I did misread your comment. I should have said No, that's not how floating point fadd works :)
fadd -0.0, 0.0 is 0.0. The identity value for a fadd is -0.0. A vaddt.f32 Qd, Qn, Qm with a false lane predicate will take the original value of Qd unchanged for that lane (so if Qd==Qn, the original value from the input will be used, without any addition).

Luckily though, the transform you are altering is in terms of VSELECT and FADD nodes, which have the standard definitions, and you can think about without considering MVE specifics. The transform just needs to be valid in general.

Old: fadd (select c, y, 0), x
true -> fadd y, x
false -> fadd 0, x

New: select c, (fadd x, y), x
true -> fadd x, y
false -> x

Which is only valid if fadd 0, x is x, which needs nsz.

Added checks for nsz flag on the select when the identity is +0.0

Thanks for the explanation @dmgreen!

spatel added inline comments.Jun 8 2022, 8:53 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
16729	This is not strictly correct. If the 'fadd' doesn't have 'nsz', then the transform results in an 'nsz' value where the original sequence did not. The transform is safe as long as the 'fadd' has 'nsz', so that's the flag we should use to enable the transform. We can propagate the fadd's 'nsz' to the new 'select' even if the original 'select' was not 'nsz' itself. It's confusing, but you can run examples with Alive2 to see what's valid: https://alive2.llvm.org/ce/z/Z-cuhQ

david-arm added inline comments.Jun 8 2022, 8:59 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
16729	Doesn't this contradict https://reviews.llvm.org/D126774 then? In my original instcombine patch I think it was suggested that we should add nsz to the select instruction being generated by the vectoriser, whereas from this comment it sounds like we don't need to and can just rely upon the nsz flag on the IR fadd?

spatel added inline comments.Jun 8 2022, 9:11 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
16729	No, the transforms are not strictly reversible/bidirectional: https://alive2.llvm.org/ce/z/G42d8e In both cases, I think we need 'nsz' on the final value in the pattern to enable the transform safely. It is possible that we've created some unintended FMF propagation though when the flags are different on each value. The semantics are fuzzy and having different flags on values within a function seems highly unlikely, but we should probably add more tests to cover those possibilities.

Harbormaster completed remote builds in B168570: Diff 435156.Jun 8 2022, 9:29 AM

Use the flags for the fadd to determine the safety of the transformation and apply the NSZ flag to the select if needed.
Tweaked the tests to demonstrate the transform succeeds when the nsz flag is only on the fadd.

david-arm marked 2 inline comments as done.Jun 9 2022, 1:34 AM

david-arm added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
16729	OK thanks for explaining and looking into this in detail! I guess differences in flags are much more likely to occur with inlining, especially as part of LTO where different translation units may be compiled differently.

Harbormaster completed remote builds in B168760: Diff 435449.Jun 9 2022, 2:25 AM

Thanks for putting this patch together. That's a great help and the regressions were pretty large otherwise! I am still getting some odd results from the change, but I think that is just folding the select 0 into a masked load, which is OK.

Other than using FaddFlags instead of SelFlags, this LGTM.

llvm/lib/Target/ARM/ARMISelLowering.cpp
16731	I don't think it is valid to transfer the flags to the select, unfortunately. nsz may be valid, but not the existing flags. https://alive2.llvm.org/ce/z/9u943A It should be valid to transfer the flags from the fadd, as that is the end result of the old pattern. https://alive2.llvm.org/ce/z/PzGZSw

This revision is now accepted and ready to land.Jun 10 2022, 1:14 AM

This revision was landed with ongoing or failed builds.Jun 10 2022, 3:10 AM

Closed by commit rG007917b95ce2: [MVE] Fold fadd(select(..., +0.0)) into a predicated fadd (authored by david-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

david-arm marked an inline comment as done.

david-arm added a commit: rG007917b95ce2: [MVE] Fold fadd(select(..., +0.0)) into a predicated fadd.

david-arm marked an inline comment as done.Jun 10 2022, 3:11 AM

david-arm added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
16731	Thanks for the LGTM @dmgreen! I addressed this comment before merging.

spatel mentioned this in D127493: [NFC][InstCombine] Refactor InstCombinerImpl::foldSelectIntoOp.Jun 10 2022, 8:58 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMISelLowering.cpp

5 lines

test/

CodeGen/

Thumb2/

mve-pred-selectop3.ll

28 lines

Diff 435073

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,700 Lines • ▼ Show 20 Lines	static SDValue PerformFAddVSelectCombine(SDNode *N, SelectionDAG &DAG,
// Turn (fadd x, (vselect c, y, -0.0)) into (vselect c, (fadd x, y), x)		// Turn (fadd x, (vselect c, y, -0.0)) into (vselect c, (fadd x, y), x)
// The second form can be more easily turned into a predicated vadd, and		// The second form can be more easily turned into a predicated vadd, and
// possibly combined into a fma to become a predicated vfma.		// possibly combined into a fma to become a predicated vfma.
SDValue Op0 = N->getOperand(0);		SDValue Op0 = N->getOperand(0);
SDValue Op1 = N->getOperand(1);		SDValue Op1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);

// The identity element for a fadd is -0.0, which these VMOV's represent.		// The identity element for a fadd is -0.0, which these VMOV's represent.
		dmgreenUnsubmitted Done Reply Inline Actions Can you update this comment. The Lambda name is no longer correct too. dmgreen: Can you update this comment. The Lambda name is no longer correct too.
auto isNegativeZeroSplat = [&](SDValue Op) {		auto isNegativeZeroSplat = [&](SDValue Op) {
if (Op.getOpcode() != ISD::BITCAST \|\|		if (Op.getOpcode() != ISD::BITCAST \|\|
Op.getOperand(0).getOpcode() != ARMISD::VMOVIMM)		Op.getOperand(0).getOpcode() != ARMISD::VMOVIMM)
return false;		return false;
if (VT == MVT::v4f32 && Op.getOperand(0).getConstantOperandVal(0) == 1664)		uint64_t ImmVal = Op.getOperand(0).getConstantOperandVal(0);
		if (VT == MVT::v4f32 && (ImmVal == 1664 \|\| !ImmVal))
		dmgreenUnsubmitted Done Reply Inline Actions ImmVal == 0 would show the intent a little better. dmgreen: ImmVal == 0 would show the intent a little better.
return true;		return true;
if (VT == MVT::v8f16 && Op.getOperand(0).getConstantOperandVal(0) == 2688)		if (VT == MVT::v8f16 && (ImmVal == 2688 \|\| !ImmVal))
return true;		return true;
return false;		return false;
};		};

if (Op0.getOpcode() == ISD::VSELECT && Op1.getOpcode() != ISD::VSELECT)		if (Op0.getOpcode() == ISD::VSELECT && Op1.getOpcode() != ISD::VSELECT)
std::swap(Op0, Op1);		std::swap(Op0, Op1);

if (Op1.getOpcode() != ISD::VSELECT \|\|		if (Op1.getOpcode() != ISD::VSELECT \|\|
!isNegativeZeroSplat(Op1.getOperand(2)))		!isNegativeZeroSplat(Op1.getOperand(2)))
return SDValue();		return SDValue();
SDValue FAdd =		SDValue FAdd =
DAG.getNode(ISD::FADD, DL, VT, Op0, Op1.getOperand(1), N->getFlags());		DAG.getNode(ISD::FADD, DL, VT, Op0, Op1.getOperand(1), N->getFlags());
		spatelUnsubmitted Done Reply Inline Actions This is not strictly correct. If the 'fadd' doesn't have 'nsz', then the transform results in an 'nsz' value where the original sequence did not. The transform is safe as long as the 'fadd' has 'nsz', so that's the flag we should use to enable the transform. We can propagate the fadd's 'nsz' to the new 'select' even if the original 'select' was not 'nsz' itself. It's confusing, but you can run examples with Alive2 to see what's valid: https://alive2.llvm.org/ce/z/Z-cuhQ spatel: This is not strictly correct. If the 'fadd' doesn't have 'nsz', then the transform results in…
		david-armAuthorUnsubmitted Done Reply Inline Actions Doesn't this contradict https://reviews.llvm.org/D126774 then? In my original instcombine patch I think it was suggested that we should add nsz to the select instruction being generated by the vectoriser, whereas from this comment it sounds like we don't need to and can just rely upon the nsz flag on the IR fadd? david-arm: Doesn't this contradict https://reviews.llvm.org/D126774 then? In my original instcombine patch…
		spatelUnsubmitted Done Reply Inline Actions No, the transforms are not strictly reversible/bidirectional: https://alive2.llvm.org/ce/z/G42d8e In both cases, I think we need 'nsz' on the final value in the pattern to enable the transform safely. It is possible that we've created some unintended FMF propagation though when the flags are different on each value. The semantics are fuzzy and having different flags on values within a function seems highly unlikely, but we should probably add more tests to cover those possibilities. spatel: No, the transforms are not strictly reversible/bidirectional: https://alive2.llvm.
		david-armAuthorUnsubmitted Done Reply Inline Actions OK thanks for explaining and looking into this in detail! I guess differences in flags are much more likely to occur with inlining, especially as part of LTO where different translation units may be compiled differently. david-arm: OK thanks for explaining and looking into this in detail! I guess differences in flags are much…
return DAG.getNode(ISD::VSELECT, DL, VT, Op1.getOperand(0), FAdd, Op0);		return DAG.getNode(ISD::VSELECT, DL, VT, Op1.getOperand(0), FAdd, Op0);
}		}
		dmgreenUnsubmitted Not Done Reply Inline Actions I don't think it is valid to transfer the flags to the select, unfortunately. nsz may be valid, but not the existing flags. https://alive2.llvm.org/ce/z/9u943A It should be valid to transfer the flags from the fadd, as that is the end result of the old pattern. https://alive2.llvm.org/ce/z/PzGZSw dmgreen: I don't think it is valid to transfer the flags to the select, unfortunately. nsz may be valid…
		david-armAuthorUnsubmitted Done Reply Inline Actions Thanks for the LGTM @dmgreen! I addressed this comment before merging. david-arm: Thanks for the LGTM @dmgreen! I addressed this comment before merging.

/// PerformVDIVCombine - VCVT (fixed-point to floating-point, Advanced SIMD)		/// PerformVDIVCombine - VCVT (fixed-point to floating-point, Advanced SIMD)
/// can replace combinations of VCVT (integer to floating-point) and VDIV		/// can replace combinations of VCVT (integer to floating-point) and VDIV
/// when the VDIV has a constant operand that is a power of 2.		/// when the VDIV has a constant operand that is a power of 2.
///		///
/// Example (assume d17 = <float 8.000000e+00, float 8.000000e+00>):		/// Example (assume d17 = <float 8.000000e+00, float 8.000000e+00>):
/// vcvt.f32.s32 d16, d16		/// vcvt.f32.s32 d16, d16
/// vdiv.f32 d16, d17, d16		/// vdiv.f32 d16, d17, d16
▲ Show 20 Lines • Show All 4,985 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-pred-selectop3.ll

	Show First 20 Lines • Show All 357 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%c = call <4 x i1> @llvm.arm.mve.vctp32(i32 %n)			%c = call <4 x i1> @llvm.arm.mve.vctp32(i32 %n)
	%a = select <4 x i1> %c, <4 x float> %y, <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>			%a = select <4 x i1> %c, <4 x float> %y, <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>
	%b = fadd <4 x float> %a, %x			%b = fadd <4 x float> %a, %x
	ret <4 x float> %b			ret <4 x float> %b
	}			}

				define arm_aapcs_vfpcc <4 x float> @fadd_v4f32_x2(<4 x float> %x, <4 x float> %y, i32 %n) {
				; CHECK-LABEL: fadd_v4f32_x2:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vctp.32 r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vaddt.f32 q0, q0, q1
				; CHECK-NEXT: bx lr
				entry:
				%c = call <4 x i1> @llvm.arm.mve.vctp32(i32 %n)
				%a = select <4 x i1> %c, <4 x float> %y, <4 x float> <float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>
				dmgreenUnsubmitted Done Reply Inline Actions Can you add nsz variants of the tests and make sure it is only those that get folded. dmgreen: Can you add nsz variants of the tests and make sure it is only those that get folded.
				david-armAuthorUnsubmitted Done Reply Inline Actions I was a bit confused by your comment starting `Yes, but I'm not sure how that matters.` Do you agree with @spatel that we should only apply the fold for '+0.0' if the nsz flag is set on the select? Hopefully the FMF flags on the IR select have been propagated to the VSELECT DAG node. david-arm: I was a bit confused by your comment starting `Yes, but I'm not sure how that matters. ` Do you…
				%b = fadd <4 x float> %a, %x
				ret <4 x float> %b
				}

	define arm_aapcs_vfpcc <8 x half> @fadd_v8f16_x(<8 x half> %x, <8 x half> %y, i32 %n) {			define arm_aapcs_vfpcc <8 x half> @fadd_v8f16_x(<8 x half> %x, <8 x half> %y, i32 %n) {
	; CHECK-LABEL: fadd_v8f16_x:			; CHECK-LABEL: fadd_v8f16_x:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vctp.16 r0			; CHECK-NEXT: vctp.16 r0
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vaddt.f16 q0, q0, q1			; CHECK-NEXT: vaddt.f16 q0, q0, q1
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%c = call <8 x i1> @llvm.arm.mve.vctp16(i32 %n)			%c = call <8 x i1> @llvm.arm.mve.vctp16(i32 %n)
	%a = select <8 x i1> %c, <8 x half> %y, <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>			%a = select <8 x i1> %c, <8 x half> %y, <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>
	%b = fadd <8 x half> %a, %x			%b = fadd <8 x half> %a, %x
	ret <8 x half> %b			ret <8 x half> %b
	}			}

				define arm_aapcs_vfpcc <8 x half> @fadd_v8f16_x2(<8 x half> %x, <8 x half> %y, i32 %n) {
				; CHECK-LABEL: fadd_v8f16_x2:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vctp.16 r0
				; CHECK-NEXT: vpst
				; CHECK-NEXT: vaddt.f16 q0, q0, q1
				; CHECK-NEXT: bx lr
				entry:
				%c = call <8 x i1> @llvm.arm.mve.vctp16(i32 %n)
				%a = select <8 x i1> %c, <8 x half> %y, <8 x half> <half 0x0000, half 0x00000, half 0x00000, half 0x00000, half 0x00000, half 0x00000, half 0x00000, half 0x00000>
				%b = fadd <8 x half> %a, %x
				ret <8 x half> %b
				}

	define arm_aapcs_vfpcc <4 x float> @fsub_v4f32_x(<4 x float> %x, <4 x float> %y, i32 %n) {			define arm_aapcs_vfpcc <4 x float> @fsub_v4f32_x(<4 x float> %x, <4 x float> %y, i32 %n) {
	; CHECK-LABEL: fsub_v4f32_x:			; CHECK-LABEL: fsub_v4f32_x:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vctp.32 r0			; CHECK-NEXT: vctp.32 r0
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vsubt.f32 q0, q0, q1			; CHECK-NEXT: vsubt.f32 q0, q0, q1
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	▲ Show 20 Lines • Show All 2,376 Lines • Show Last 20 Lines