This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
1/3
DAGCombiner.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/2
neon-shift-neg.ll

Differential D103755

[DAG] Fold neg(bvsplat(neg(x)) -> bvsplat(x)
ClosedPublic

Authored by dmgreen on Jun 5 2021, 11:49 AM.

Download Raw Diff

Details

Reviewers

efriedma
david-arm
sdesmalen
NickGuy
t.p.northover
RKSimon
craig.topper

Commits

rGb8c8bb07692c: [DAG] Fold neg(splat(neg(x)) -> splat(x)

Summary

This add as a fold of sub(0, buildvector_splat(sub(0, x))) -> buildvector_splat(x). This is something that can come up in the lowering of right shifts under AArch64, where we generate a shift left of a negated number.

Diff Detail

Event Timeline

dmgreen created this revision.Jun 5 2021, 11:49 AM

Herald added subscribers: ecnelises, hiraditya, kristof.beyls. · View Herald TranscriptJun 5 2021, 11:49 AM

dmgreen requested review of this revision.Jun 5 2021, 11:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2021, 11:49 AM

Harbormaster completed remote builds in B107822: Diff 350069.Jun 5 2021, 11:50 AM

dmgreen mentioned this in D103756: [DAG] Allow isNullOrNullSplat to see truncated zeroes.Jun 5 2021, 11:51 AM

dmgreen added a parent revision: D103703: [AArch64] Remove AArch64ISD::NEG.

dmgreen added a child revision: D103756: [DAG] Allow isNullOrNullSplat to see truncated zeroes.

Funnily enough I was wondering about this pattern the other day as a followup to D98778...

Should we always be folding unaryop(splat(x)) -> splat(unaryop(x)) if the unaryop is legal/custom on the scalar type? And then maybe extend that to binop(splat(x),splat(y)) -> splat(binop(x,y)) as well?

Matt added a subscriber: Matt.Jun 7 2021, 8:17 AM

dmgreen mentioned this in rGb889c6ee9911: [DAG] Allow isNullOrNullSplat to see truncated zeroes.Jun 8 2021, 2:19 AM

Rebase over D103756

In D103755#2800920, @RKSimon wrote:

Funnily enough I was wondering about this pattern the other day as a followup to D98778...

Should we always be folding unaryop(splat(x)) -> splat(unaryop(x)) if the unaryop is legal/custom on the scalar type? And then maybe extend that to binop(splat(x),splat(y)) -> splat(binop(x,y)) as well?

Maybe. Not sure. I always find it difficult to see when optimizations like that would be universally beneficial over a wide range of very different architectures, considering how different they can be. I can see that it would make this more general, but it seems easy to think of cases where it would make things worse.

For this case I think it would need to work with the truncated type, not the scalar element type. For a v8i16 the scalar element type would not be legal under AArch64. If it was using the element type it would only handle half of the tests changed here.

Harbormaster completed remote builds in B108161: Diff 350538.Jun 8 2021, 2:58 AM

sdesmalen added inline comments.Jun 8 2021, 4:25 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3313	Can `SelectionDAG::getSplatValue()` be used here? If so, then this doesn't need to be limited to BUILD_VECTOR, as it seems this fold would work equally well for scalable vectors (which use SPLAT_VECTOR).

dmgreen added inline comments.Jun 21 2021, 12:12 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3313	Thanks for taking a look. Yeah, seems to work OK. I'll make it so.

Use getSplatValue.

Harbormaster completed remote builds in B110130: Diff 353282.Jun 21 2021, 12:15 AM

LGTM - cheers

This revision is now accepted and ready to land.Jun 23 2021, 7:02 AM

sdesmalen added inline comments.Jun 24 2021, 4:43 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3312	nit: s/bvsplat/splat/g
llvm/test/CodeGen/AArch64/neon-shift-neg.ll
246	It seems odd to have a scalable-vector test in `neon-shift-neg.ll` ? (I actually can't find this test in latest HEAD, am I looking at the right diff?)

dmgreen added inline comments.Jun 25 2021, 11:30 AM

llvm/test/CodeGen/AArch64/neon-shift-neg.ll
246	Yeah.. The same pattern of negated shifts does not apply for SVE (which makes it a little less useful). I've added a more direct test in 77ae9b364a9d9b99501163761313cefbb345cea7.

This revision was landed with ongoing or failed builds.Jun 25 2021, 11:53 AM

Closed by commit rGb8c8bb07692c: [DAG] Fold neg(splat(neg(x)) -> splat(x) (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGb8c8bb07692c: [DAG] Fold neg(splat(neg(x)) -> splat(x).

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

8 lines

test/

CodeGen/

AArch64/

neon-shift-neg.ll

36 lines

Diff 350538

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,302 Lines • ▼ Show 20 Lines	if (isNullOrNullSplat(N0)) {
}		}

// Convert 0 - abs(x).		// Convert 0 - abs(x).
SDValue Result;		SDValue Result;
if (N1->getOpcode() == ISD::ABS &&		if (N1->getOpcode() == ISD::ABS &&
!TLI.isOperationLegalOrCustom(ISD::ABS, VT) &&		!TLI.isOperationLegalOrCustom(ISD::ABS, VT) &&
TLI.expandABS(N1.getNode(), Result, DAG, true))		TLI.expandABS(N1.getNode(), Result, DAG, true))
return Result;		return Result;

		// Fold neg(bvsplat(neg(x)) -> bvsplat(x)
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: s/bvsplat/splat/g sdesmalen: nit: s/bvsplat/splat/g
		if (N1.getOpcode() == ISD::BUILD_VECTOR &&
		sdesmalenUnsubmitted Not Done Reply Inline Actions Can `SelectionDAG::getSplatValue()` be used here? If so, then this doesn't need to be limited to BUILD_VECTOR, as it seems this fold would work equally well for scalable vectors (which use SPLAT_VECTOR). sdesmalen: Can `SelectionDAG::getSplatValue()` be used here? If so, then this doesn't need to be limited…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Thanks for taking a look. Yeah, seems to work OK. I'll make it so. dmgreen: Thanks for taking a look. Yeah, seems to work OK. I'll make it so.
		llvm::all_of(N1->ops(),
		[&](SDValue Op) { return Op == N1.getOperand(0); }) &&
		N1.getOperand(0)->getOpcode() == ISD::SUB &&
		isNullConstant(N1.getOperand(0)->getOperand(0)))
		return DAG.getSplatBuildVector(VT, DL, N1.getOperand(0)->getOperand(1));
}		}

// Canonicalize (sub -1, x) -> ~x, i.e. (xor x, -1)		// Canonicalize (sub -1, x) -> ~x, i.e. (xor x, -1)
if (isAllOnesOrAllOnesSplat(N0))		if (isAllOnesOrAllOnesSplat(N0))
return DAG.getNode(ISD::XOR, DL, VT, N1, N0);		return DAG.getNode(ISD::XOR, DL, VT, N1, N0);

// fold (A - (0-B)) -> A+B		// fold (A - (0-B)) -> A+B
if (N1.getOpcode() == ISD::SUB && isNullOrNullSplat(N1.getOperand(0)))		if (N1.getOpcode() == ISD::SUB && isNullOrNullSplat(N1.getOperand(0)))
▲ Show 20 Lines • Show All 19,927 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/neon-shift-neg.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s

	define <2 x i64> @shr64x2(<2 x i64> %a, i64 %b) {			define <2 x i64> @shr64x2(<2 x i64> %a, i64 %b) {
	; CHECK-LABEL: shr64x2:			; CHECK-LABEL: shr64x2:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: neg x8, x0			; CHECK-NEXT: dup v1.2d, x0
	; CHECK-NEXT: dup v1.2d, x8
	; CHECK-NEXT: neg v1.2d, v1.2d
	; CHECK-NEXT: sshl v0.2d, v0.2d, v1.2d			; CHECK-NEXT: sshl v0.2d, v0.2d, v1.2d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = sub nsw i64 0, %b			%sub = sub nsw i64 0, %b
	%splat.splatinsert = insertelement <2 x i64> poison, i64 %sub, i32 0			%splat.splatinsert = insertelement <2 x i64> poison, i64 %sub, i32 0
	%splat.splat = shufflevector <2 x i64> %splat.splatinsert, <2 x i64> poison, <2 x i32> zeroinitializer			%splat.splat = shufflevector <2 x i64> %splat.splatinsert, <2 x i64> poison, <2 x i32> zeroinitializer
	%shr = ashr <2 x i64> %a, %splat.splat			%shr = ashr <2 x i64> %a, %splat.splat
	ret <2 x i64> %shr			ret <2 x i64> %shr
	}			}

	define <4 x i32> @shr32x4(<4 x i32> %a, i32 %b) {			define <4 x i32> @shr32x4(<4 x i32> %a, i32 %b) {
	; CHECK-LABEL: shr32x4:			; CHECK-LABEL: shr32x4:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: neg w8, w0			; CHECK-NEXT: dup v1.4s, w0
	; CHECK-NEXT: dup v1.4s, w8
	; CHECK-NEXT: neg v1.4s, v1.4s
	; CHECK-NEXT: sshl v0.4s, v0.4s, v1.4s			; CHECK-NEXT: sshl v0.4s, v0.4s, v1.4s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = sub nsw i32 0, %b			%sub = sub nsw i32 0, %b
	%splat.splatinsert = insertelement <4 x i32> poison, i32 %sub, i32 0			%splat.splatinsert = insertelement <4 x i32> poison, i32 %sub, i32 0
	%splat.splat = shufflevector <4 x i32> %splat.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer			%splat.splat = shufflevector <4 x i32> %splat.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer
	%shr = ashr <4 x i32> %a, %splat.splat			%shr = ashr <4 x i32> %a, %splat.splat
	ret <4 x i32> %shr			ret <4 x i32> %shr
	}			}

	define <4 x i32> @shr32x4undef(<4 x i32> %a, i32 %b) {			define <4 x i32> @shr32x4undef(<4 x i32> %a, i32 %b) {
	; CHECK-LABEL: shr32x4undef:			; CHECK-LABEL: shr32x4undef:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: neg w8, w0			; CHECK-NEXT: dup v1.4s, w0
	; CHECK-NEXT: dup v1.4s, w8
	; CHECK-NEXT: neg v1.4s, v1.4s
	; CHECK-NEXT: sshl v0.4s, v0.4s, v1.4s			; CHECK-NEXT: sshl v0.4s, v0.4s, v1.4s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = sub nsw i32 0, %b			%sub = sub nsw i32 0, %b
	%splat.splatinsert = insertelement <4 x i32> poison, i32 %sub, i32 0			%splat.splatinsert = insertelement <4 x i32> poison, i32 %sub, i32 0
	%splat.splat = shufflevector <4 x i32> %splat.splatinsert, <4 x i32> poison, <4 x i32> <i32 undef, i32 0, i32 0, i32 0>			%splat.splat = shufflevector <4 x i32> %splat.splatinsert, <4 x i32> poison, <4 x i32> <i32 undef, i32 0, i32 0, i32 0>
	%shr = ashr <4 x i32> %a, %splat.splat			%shr = ashr <4 x i32> %a, %splat.splat
	ret <4 x i32> %shr			ret <4 x i32> %shr
	}			}

	define <8 x i16> @shr16x8(<8 x i16> %a, i16 %b) {			define <8 x i16> @shr16x8(<8 x i16> %a, i16 %b) {
	; CHECK-LABEL: shr16x8:			; CHECK-LABEL: shr16x8:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: neg w8, w0			; CHECK-NEXT: dup v1.8h, w0
	; CHECK-NEXT: dup v1.8h, w8
	; CHECK-NEXT: neg v1.8h, v1.8h
	; CHECK-NEXT: sshl v0.8h, v0.8h, v1.8h			; CHECK-NEXT: sshl v0.8h, v0.8h, v1.8h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = sub i16 0, %b			%sub = sub i16 0, %b
	%0 = insertelement <8 x i16> undef, i16 %sub, i32 0			%0 = insertelement <8 x i16> undef, i16 %sub, i32 0
	%sh_prom = shufflevector <8 x i16> %0, <8 x i16> undef, <8 x i32> zeroinitializer			%sh_prom = shufflevector <8 x i16> %0, <8 x i16> undef, <8 x i32> zeroinitializer
	%shr = ashr <8 x i16> %a, %sh_prom			%shr = ashr <8 x i16> %a, %sh_prom
	ret <8 x i16> %shr			ret <8 x i16> %shr
	}			}

	define <16 x i8> @shr8x16(<16 x i8> %a, i8 %b) {			define <16 x i8> @shr8x16(<16 x i8> %a, i8 %b) {
	; CHECK-LABEL: shr8x16:			; CHECK-LABEL: shr8x16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: neg w8, w0			; CHECK-NEXT: dup v1.16b, w0
	; CHECK-NEXT: dup v1.16b, w8
	; CHECK-NEXT: neg v1.16b, v1.16b
	; CHECK-NEXT: sshl v0.16b, v0.16b, v1.16b			; CHECK-NEXT: sshl v0.16b, v0.16b, v1.16b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = sub i8 0, %b			%sub = sub i8 0, %b
	%0 = insertelement <16 x i8> undef, i8 %sub, i32 0			%0 = insertelement <16 x i8> undef, i8 %sub, i32 0
	%sh_prom = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> zeroinitializer			%sh_prom = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> zeroinitializer
	%shr = ashr <16 x i8> %a, %sh_prom			%shr = ashr <16 x i8> %a, %sh_prom
	ret <16 x i8> %shr			ret <16 x i8> %shr
	}			}

	define <1 x i64> @shr64x1(<1 x i64> %a, i64 %b) {			define <1 x i64> @shr64x1(<1 x i64> %a, i64 %b) {
	; CHECK-LABEL: shr64x1:			; CHECK-LABEL: shr64x1:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: neg x8, x0			; CHECK-NEXT: fmov d1, x0
	; CHECK-NEXT: fmov d1, x8
	; CHECK-NEXT: neg d1, d1
	; CHECK-NEXT: sshl d0, d0, d1			; CHECK-NEXT: sshl d0, d0, d1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = sub nsw i64 0, %b			%sub = sub nsw i64 0, %b
	%splat.splatinsert = insertelement <1 x i64> poison, i64 %sub, i32 0			%splat.splatinsert = insertelement <1 x i64> poison, i64 %sub, i32 0
	%shr = ashr <1 x i64> %a, %splat.splatinsert			%shr = ashr <1 x i64> %a, %splat.splatinsert
	ret <1 x i64> %shr			ret <1 x i64> %shr
	}			}

	define <2 x i32> @shr32x2(<2 x i32> %a, i32 %b) {			define <2 x i32> @shr32x2(<2 x i32> %a, i32 %b) {
	; CHECK-LABEL: shr32x2:			; CHECK-LABEL: shr32x2:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: neg w8, w0			; CHECK-NEXT: dup v1.2s, w0
	; CHECK-NEXT: dup v1.2s, w8
	; CHECK-NEXT: neg v1.2s, v1.2s
	; CHECK-NEXT: sshl v0.2s, v0.2s, v1.2s			; CHECK-NEXT: sshl v0.2s, v0.2s, v1.2s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = sub nsw i32 0, %b			%sub = sub nsw i32 0, %b
	%splat.splatinsert = insertelement <2 x i32> poison, i32 %sub, i32 0			%splat.splatinsert = insertelement <2 x i32> poison, i32 %sub, i32 0
	%splat.splat = shufflevector <2 x i32> %splat.splatinsert, <2 x i32> poison, <2 x i32> zeroinitializer			%splat.splat = shufflevector <2 x i32> %splat.splatinsert, <2 x i32> poison, <2 x i32> zeroinitializer
	%shr = ashr <2 x i32> %a, %splat.splat			%shr = ashr <2 x i32> %a, %splat.splat
	ret <2 x i32> %shr			ret <2 x i32> %shr
	}			}

	define <4 x i16> @shr16x4(<4 x i16> %a, i16 %b) {			define <4 x i16> @shr16x4(<4 x i16> %a, i16 %b) {
	; CHECK-LABEL: shr16x4:			; CHECK-LABEL: shr16x4:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: neg w8, w0			; CHECK-NEXT: dup v1.4h, w0
	; CHECK-NEXT: dup v1.4h, w8
	; CHECK-NEXT: neg v1.4h, v1.4h
	; CHECK-NEXT: sshl v0.4h, v0.4h, v1.4h			; CHECK-NEXT: sshl v0.4h, v0.4h, v1.4h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = sub i16 0, %b			%sub = sub i16 0, %b
	%0 = insertelement <4 x i16> undef, i16 %sub, i32 0			%0 = insertelement <4 x i16> undef, i16 %sub, i32 0
	%sh_prom = shufflevector <4 x i16> %0, <4 x i16> undef, <4 x i32> zeroinitializer			%sh_prom = shufflevector <4 x i16> %0, <4 x i16> undef, <4 x i32> zeroinitializer
	%shr = ashr <4 x i16> %a, %sh_prom			%shr = ashr <4 x i16> %a, %sh_prom
	ret <4 x i16> %shr			ret <4 x i16> %shr
	}			}

	define <8 x i8> @shr8x8(<8 x i8> %a, i8 %b) {			define <8 x i8> @shr8x8(<8 x i8> %a, i8 %b) {
	; CHECK-LABEL: shr8x8:			; CHECK-LABEL: shr8x8:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: neg w8, w0			; CHECK-NEXT: dup v1.8b, w0
	; CHECK-NEXT: dup v1.8b, w8
	; CHECK-NEXT: neg v1.8b, v1.8b
	; CHECK-NEXT: sshl v0.8b, v0.8b, v1.8b			; CHECK-NEXT: sshl v0.8b, v0.8b, v1.8b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = sub i8 0, %b			%sub = sub i8 0, %b
	%0 = insertelement <8 x i8> undef, i8 %sub, i32 0			%0 = insertelement <8 x i8> undef, i8 %sub, i32 0
	%sh_prom = shufflevector <8 x i8> %0, <8 x i8> undef, <8 x i32> zeroinitializer			%sh_prom = shufflevector <8 x i8> %0, <8 x i8> undef, <8 x i32> zeroinitializer
	%shr = ashr <8 x i8> %a, %sh_prom			%shr = ashr <8 x i8> %a, %sh_prom
	ret <8 x i8> %shr			ret <8 x i8> %shr
	▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ushl v0.8b, v0.8b, v1.8b			; CHECK-NEXT: ushl v0.8b, v0.8b, v1.8b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = sub i8 0, %b			%sub = sub i8 0, %b
	%0 = insertelement <8 x i8> undef, i8 %sub, i32 0			%0 = insertelement <8 x i8> undef, i8 %sub, i32 0
	%sh_prom = shufflevector <8 x i8> %0, <8 x i8> undef, <8 x i32> zeroinitializer			%sh_prom = shufflevector <8 x i8> %0, <8 x i8> undef, <8 x i32> zeroinitializer
	%shl = shl <8 x i8> %a, %sh_prom			%shl = shl <8 x i8> %a, %sh_prom
	ret <8 x i8> %shl			ret <8 x i8> %shl
	}			}
				sdesmalenUnsubmitted Not Done Reply Inline Actions It seems odd to have a scalable-vector test in `neon-shift-neg.ll` ? (I actually can't find this test in latest HEAD, am I looking at the right diff?) sdesmalen: It seems odd to have a scalable-vector test in `neon-shift-neg.ll` ? (I actually can't find…
				dmgreenAuthorUnsubmitted Done Reply Inline Actions Yeah.. The same pattern of negated shifts does not apply for SVE (which makes it a little less useful). I've added a more direct test in 77ae9b364a9d9b99501163761313cefbb345cea7. dmgreen: Yeah.. The same pattern of negated shifts does not apply for SVE (which makes it a little less…