This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
TargetLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
uadd_sat_vec.ll

Differential D59066

[TargetLowering] improve the default expansion of uaddsat/usubsat
ClosedPublic

Authored by spatel on Mar 6 2019, 6:10 PM.

Download Raw Diff

Details

Reviewers

nikic
RKSimon
craig.topper
lebedev.ri
efriedma

Commits

rG6a6e808b699b: [TargetLowering] improve the default expansion of uaddsat/usubsat
rL356332: [TargetLowering] improve the default expansion of uaddsat/usubsat

Summary

This is an alternative to D59006 that achieves identical results for x86, and also makes an improvement for AArch64. The logic is pushing the limits of target-independence, but this is what it takes to not induce any regressions for x86, and there's no harm to AArch...so better all-around?

Normally, we'd try to improve the generic combines underlying the sub-optimal output that we see in the test diffs, but that did not look easy/possible for the cases I looked at. For example, the AArch bic/bic appears to be missed because one of those is a generic 'not' op, but the other is already a BIC node.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Mar 6 2019, 6:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2019, 6:10 PM

Herald added subscribers: jdoerfert, hiraditya, kristof.beyls and 2 others. · View Herald Transcript

This is really 2 independent changes:

The changes to min/max expansion are what helps x86.
The changes to UADDO/USUBO expansion are what helps AArch.

The 1st part may be over-the-line, but the 2nd is non-controversial AFAICT, so let's just try that hunk for now?

(It's really just the UADDO side that affects the AArch tests, but keep both changes for symmetry.)

nikic mentioned this in D59174: [DAGCombine] Fold (x & ~y) | y patterns.Mar 9 2019, 2:28 AM

The AArch-specific improvements in D59187 / D59174 seem like good patches, but even without those test diffs, do we still want this general optimization, so other targets are not similarly impacted? (We don't have these tests for any other in-tree targets AFAIK.)

@spatel D59174 is not AArch64 specific and covers what this patch does, but only for the case where VSELECT is expanded. An alternative would be to add a combine for vselect zero_or_neg_one, neg_one, x -> zero_or_neg_one | x and vselect zero_or_neg_one, 0, x -> ~zero_or_neg_one & x, though it's arguable whether the latter is an improvement in general.

Also okay with what this patch does, but the other test changes from D59174 show that the pattern can also occur in other cases, so we might want to handle it more generally.

In D59066#1426663, @nikic wrote:

@spatel D59174 is not AArch64 specific and covers what this patch does, but only for the case where VSELECT is expanded. An alternative would be to add a combine for vselect zero_or_neg_one, neg_one, x -> zero_or_neg_one | x and vselect zero_or_neg_one, 0, x -> ~zero_or_neg_one & x, though it's arguable whether the latter is an improvement in general.

Ah, sorry - I missed the x86 diff there the first time I looked. So yes, the benefit of that patch seems clear.

And yes, the pattern with 'and-not' isn't clearly a win unless the target has that op as a single instruction, but it seems common enough that we would choose that as the optimized form here in the legalizer. So I'd go with all of these patches to try to make sure that we get optimized code for the intrinsics.

Looks good to me.

This revision is now accepted and ready to land.Mar 17 2019, 2:18 AM

Closed by commit rL356332: [TargetLowering] improve the default expansion of uaddsat/usubsat (authored by spatel). · Explain WhyMar 17 2019, 7:57 AM

This revision was automatically updated to reflect the committed changes.

nikic mentioned this in rL356333: [DAGCombine] Fold (x & ~y) | y patterns.Mar 17 2019, 8:44 AM

nikic mentioned this in rG9a4453592bfe: [DAGCombine] Fold (x & ~y) | y patterns.

spatel mentioned this in D59006: [x86] improve the default expansion of uaddsat/usubsat.Mar 20 2019, 3:43 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

11 lines

test/

CodeGen/

AArch64/

uadd_sat_vec.ll

21 lines

Diff 191018

llvm/trunk/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 5,420 Lines • ▼ Show 20 Lines	SDValue TargetLowering::expandAddSubSat(SDNode *Node, SelectionDAG &DAG) const {
SDValue Result = DAG.getNode(OverflowOp, dl, DAG.getVTList(VT, BoolVT),		SDValue Result = DAG.getNode(OverflowOp, dl, DAG.getVTList(VT, BoolVT),
LHS, RHS);		LHS, RHS);
SDValue SumDiff = Result.getValue(0);		SDValue SumDiff = Result.getValue(0);
SDValue Overflow = Result.getValue(1);		SDValue Overflow = Result.getValue(1);
SDValue Zero = DAG.getConstant(0, dl, VT);		SDValue Zero = DAG.getConstant(0, dl, VT);
SDValue AllOnes = DAG.getAllOnesConstant(dl, VT);		SDValue AllOnes = DAG.getAllOnesConstant(dl, VT);

if (Opcode == ISD::UADDSAT) {		if (Opcode == ISD::UADDSAT) {
		if (getBooleanContents(VT) == ZeroOrNegativeOneBooleanContent) {
		// (LHS + RHS) \| OverflowMask
		SDValue OverflowMask = DAG.getSExtOrTrunc(Overflow, dl, VT);
		return DAG.getNode(ISD::OR, dl, VT, SumDiff, OverflowMask);
		}
// Overflow ? 0xffff.... : (LHS + RHS)		// Overflow ? 0xffff.... : (LHS + RHS)
return DAG.getSelect(dl, VT, Overflow, AllOnes, SumDiff);		return DAG.getSelect(dl, VT, Overflow, AllOnes, SumDiff);
} else if (Opcode == ISD::USUBSAT) {		} else if (Opcode == ISD::USUBSAT) {
		if (getBooleanContents(VT) == ZeroOrNegativeOneBooleanContent) {
		// (LHS - RHS) & ~OverflowMask
		SDValue OverflowMask = DAG.getSExtOrTrunc(Overflow, dl, VT);
		SDValue Not = DAG.getNOT(dl, OverflowMask, VT);
		return DAG.getNode(ISD::AND, dl, VT, SumDiff, Not);
		}
// Overflow ? 0 : (LHS - RHS)		// Overflow ? 0 : (LHS - RHS)
return DAG.getSelect(dl, VT, Overflow, Zero, SumDiff);		return DAG.getSelect(dl, VT, Overflow, Zero, SumDiff);
} else {		} else {
// SatMax -> Overflow && SumDiff < 0		// SatMax -> Overflow && SumDiff < 0
// SatMin -> Overflow && SumDiff >= 0		// SatMin -> Overflow && SumDiff >= 0
APInt MinVal = APInt::getSignedMinValue(BitWidth);		APInt MinVal = APInt::getSignedMinValue(BitWidth);
APInt MaxVal = APInt::getSignedMaxValue(BitWidth);		APInt MaxVal = APInt::getSignedMaxValue(BitWidth);
SDValue SatMin = DAG.getConstant(MinVal, dl, VT);		SDValue SatMin = DAG.getConstant(MinVal, dl, VT);
▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/uadd_sat_vec.ll

Show First 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret <16 x i32> %z		ret <16 x i32> %z
}		}

define <2 x i64> @v2i64(<2 x i64> %x, <2 x i64> %y) nounwind {		define <2 x i64> @v2i64(<2 x i64> %x, <2 x i64> %y) nounwind {
; CHECK-LABEL: v2i64:		; CHECK-LABEL: v2i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: add v1.2d, v0.2d, v1.2d		; CHECK-NEXT: add v1.2d, v0.2d, v1.2d
; CHECK-NEXT: cmhi v0.2d, v0.2d, v1.2d		; CHECK-NEXT: cmhi v0.2d, v0.2d, v1.2d
; CHECK-NEXT: bic v1.16b, v1.16b, v0.16b		; CHECK-NEXT: orr v0.16b, v1.16b, v0.16b
; CHECK-NEXT: orr v0.16b, v0.16b, v1.16b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%z = call <2 x i64> @llvm.uadd.sat.v2i64(<2 x i64> %x, <2 x i64> %y)		%z = call <2 x i64> @llvm.uadd.sat.v2i64(<2 x i64> %x, <2 x i64> %y)
ret <2 x i64> %z		ret <2 x i64> %z
}		}

define <4 x i64> @v4i64(<4 x i64> %x, <4 x i64> %y) nounwind {		define <4 x i64> @v4i64(<4 x i64> %x, <4 x i64> %y) nounwind {
; CHECK-LABEL: v4i64:		; CHECK-LABEL: v4i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: add v2.2d, v0.2d, v2.2d		; CHECK-NEXT: add v2.2d, v0.2d, v2.2d
; CHECK-NEXT: add v3.2d, v1.2d, v3.2d		; CHECK-NEXT: add v3.2d, v1.2d, v3.2d
; CHECK-NEXT: cmhi v0.2d, v0.2d, v2.2d		; CHECK-NEXT: cmhi v0.2d, v0.2d, v2.2d
; CHECK-NEXT: cmhi v1.2d, v1.2d, v3.2d		; CHECK-NEXT: cmhi v1.2d, v1.2d, v3.2d
; CHECK-NEXT: bic v2.16b, v2.16b, v0.16b		; CHECK-NEXT: orr v0.16b, v2.16b, v0.16b
; CHECK-NEXT: bic v3.16b, v3.16b, v1.16b		; CHECK-NEXT: orr v1.16b, v3.16b, v1.16b
; CHECK-NEXT: orr v0.16b, v0.16b, v2.16b
; CHECK-NEXT: orr v1.16b, v1.16b, v3.16b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%z = call <4 x i64> @llvm.uadd.sat.v4i64(<4 x i64> %x, <4 x i64> %y)		%z = call <4 x i64> @llvm.uadd.sat.v4i64(<4 x i64> %x, <4 x i64> %y)
ret <4 x i64> %z		ret <4 x i64> %z
}		}

define <8 x i64> @v8i64(<8 x i64> %x, <8 x i64> %y) nounwind {		define <8 x i64> @v8i64(<8 x i64> %x, <8 x i64> %y) nounwind {
; CHECK-LABEL: v8i64:		; CHECK-LABEL: v8i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: add v4.2d, v0.2d, v4.2d		; CHECK-NEXT: add v4.2d, v0.2d, v4.2d
; CHECK-NEXT: add v5.2d, v1.2d, v5.2d		; CHECK-NEXT: add v5.2d, v1.2d, v5.2d
; CHECK-NEXT: add v6.2d, v2.2d, v6.2d		; CHECK-NEXT: add v6.2d, v2.2d, v6.2d
; CHECK-NEXT: add v7.2d, v3.2d, v7.2d		; CHECK-NEXT: add v7.2d, v3.2d, v7.2d
; CHECK-NEXT: cmhi v0.2d, v0.2d, v4.2d		; CHECK-NEXT: cmhi v0.2d, v0.2d, v4.2d
; CHECK-NEXT: cmhi v1.2d, v1.2d, v5.2d		; CHECK-NEXT: cmhi v1.2d, v1.2d, v5.2d
; CHECK-NEXT: cmhi v2.2d, v2.2d, v6.2d		; CHECK-NEXT: cmhi v2.2d, v2.2d, v6.2d
; CHECK-NEXT: cmhi v3.2d, v3.2d, v7.2d		; CHECK-NEXT: cmhi v3.2d, v3.2d, v7.2d
; CHECK-NEXT: bic v4.16b, v4.16b, v0.16b		; CHECK-NEXT: orr v0.16b, v4.16b, v0.16b
; CHECK-NEXT: bic v5.16b, v5.16b, v1.16b		; CHECK-NEXT: orr v1.16b, v5.16b, v1.16b
; CHECK-NEXT: bic v6.16b, v6.16b, v2.16b		; CHECK-NEXT: orr v2.16b, v6.16b, v2.16b
; CHECK-NEXT: bic v7.16b, v7.16b, v3.16b		; CHECK-NEXT: orr v3.16b, v7.16b, v3.16b
; CHECK-NEXT: orr v0.16b, v0.16b, v4.16b
; CHECK-NEXT: orr v1.16b, v1.16b, v5.16b
; CHECK-NEXT: orr v2.16b, v2.16b, v6.16b
; CHECK-NEXT: orr v3.16b, v3.16b, v7.16b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%z = call <8 x i64> @llvm.uadd.sat.v8i64(<8 x i64> %x, <8 x i64> %y)		%z = call <8 x i64> @llvm.uadd.sat.v8i64(<8 x i64> %x, <8 x i64> %y)
ret <8 x i64> %z		ret <8 x i64> %z
}		}

define <2 x i128> @v2i128(<2 x i128> %x, <2 x i128> %y) nounwind {		define <2 x i128> @v2i128(<2 x i128> %x, <2 x i128> %y) nounwind {
; CHECK-LABEL: v2i128:		; CHECK-LABEL: v2i128:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
Show All 27 Lines