Download Raw Diff

Details

Reviewers

samparker
dmgreen
SjoerdMeijer
t.p.northover
simon_tatham
olista01
craig.topper
spatel
RKSimon

Commits

rG15e880a04fcf: [DAGCombiner] Fold an AND of a masked load into a zext_masked_load

Summary

This patch folds an AND of a masked load and build vector into a zero extended masked load.

Diff Detail

Event Timeline

samtebbs created this revision.Aug 28 2020, 8:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 28 2020, 8:25 AM

Herald added subscribers: llvm-commits, ecnelises, hiraditya. · View Herald Transcript

samtebbs requested review of this revision.Aug 28 2020, 8:25 AM

Harbormaster completed remote builds in B69922: Diff 288615.Aug 28 2020, 8:59 AM

dmgreen added reviewers: craig.topper, spatel, RKSimon.Aug 29 2020, 6:07 AM

dmgreen added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5302	Can this make use of the BuildVectorSDNode::getConstantSplatNode or isBuildVectorOfConstantSDNodes or something like it?
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
2	It can be good to show before and after in the tests, to make the differences clearer.

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 29 2020, 6:07 AM

Improved the BuildVec processing and added old checks to test.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5302	Using that is much better, thanks.
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
2	I've added extra checks to show what was generated before.

RKSimon added inline comments.Sep 1 2020, 4:33 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5291	Move the OneUse evaluations to the end of the conditions as they're the most expensive to check. if (MLoad && BVec && MLoad->getExtensionType() == ISD::EXTLOAD && N0.hasOneUse() && N1.hasOneUse())
5302	Sorry I'm away from my devmachine atm - I can't remember the APInt method exactly but I think you can do something like: Splat->getAPIntValue()->isMask((uint64_t)ElementSize)

Use isMask and move hasOneUse checks

samtebbs marked 4 inline comments as done.Sep 1 2020, 6:16 AM

samtebbs added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5302	Thanks, that's much cleaner.

samtebbs marked an inline comment as done.Sep 1 2020, 6:52 AM

samtebbs added inline comments.

llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
2	It turns out I misunderstood. Here is the difference between codegen with and without this patch. test-diff881 KBDownload

samtebbs added inline comments.Sep 1 2020, 6:55 AM

llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll

diff --git a/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll b/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
index 5db6637ca81..9696827d846 100644
--- a/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
@@ -7,10 +7,8 @@ define arm_aapcs_vfpcc <4 x float> @foo_v4i16(<4 x i16>* nocapture readonly %pSr
 ; CHECK-NEXT:    vmovlb.s16 q0, q0
 ; CHECK-NEXT:    vpt.s32 lt, q0, zr
 ; CHECK-NEXT:    vldrht.u32 q0, [r0]
-; CHECK-NEXT:    vmovlb.u16 q0, q0
 ; CHECK-NEXT:    vcvt.f32.u32 q0, q0
 ; CHECK-NEXT:    bx lr
-; CHECK-OLD-NEXT:    vmovlb.u16 q0, q0
 entry:
   %active.lane.mask = icmp slt <4 x i16> %a, zeroinitializer
   %wide.masked.load = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %pSrc, i32 2, <4 x i1> %active.lane.mask, <4 x i16> undef)
@@ -24,10 +22,8 @@ define arm_aapcs_vfpcc <8 x half> @foo_v8i8(<8 x i8>* nocapture readonly %pSrc,
 ; CHECK-NEXT:    vmovlb.s8 q0, q0
 ; CHECK-NEXT:    vpt.s16 lt, q0, zr
 ; CHECK-NEXT:    vldrbt.u16 q0, [r0]
-; CHECK-NEXT:    vmovlb.u8 q0, q0
 ; CHECK-NEXT:    vcvt.f16.u16 q0, q0
 ; CHECK-NEXT:    bx lr
-; CHECK-OLD-NEXT:    vmovlb.u8 q0, q0
 entry:
   %active.lane.mask = icmp slt <8 x i8> %a, zeroinitializer
   %wide.masked.load = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* %pSrc, i32 1, <8 x i1> %active.lane.mask, <8 x i8> undef)
@@ -39,15 +35,11 @@ define arm_aapcs_vfpcc <4 x float> @foo_v4i8(<4 x i8>* nocapture readonly %pSrc,
 ; CHECK-LABEL: foo_v4i8:
 ; CHECK:       @ %bb.0: @ %entry
 ; CHECK-NEXT:    vmovlb.s8 q0, q0
-; CHECK-NEXT:    vmov.i32 q1, #0xff
 ; CHECK-NEXT:    vmovlb.s16 q0, q0
 ; CHECK-NEXT:    vpt.s32 lt, q0, zr
 ; CHECK-NEXT:    vldrbt.u32 q0, [r0]
-; CHECK-NEXT:    vand q0, q0, q1
 ; CHECK-NEXT:    vcvt.f32.u32 q0, q0
 ; CHECK-NEXT:    bx lr
-; CHECK-OLD-NEXT:    vmov.i32 q1, #0xff
-; CHECK-OLD-NEXT:    vand q0, q0, q1
 entry:
   %active.lane.mask = icmp slt <4 x i8> %a, zeroinitializer
   %wide.masked.load = call <4 x i8> @llvm.masked.load.v4i8.p0v4i8(<4 x i8>* %pSrc, i32 1, <4 x i1> %active.lane.mask, <4 x i8> undef)

RKSimon added inline comments.Sep 1 2020, 6:58 AM

llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
2	Can you commit this test with trunk's current codegen, then rebase this patch so it shows the delta.

samtebbs added inline comments.Sep 1 2020, 7:17 AM

llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
2	The CHECK-OLD-NEXT lines in there can be ignored. They snuck in to the diff somehow.

samtebbs added inline comments.Sep 1 2020, 7:23 AM

llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
2	Sure, I'll do so now.

Show difference to previous codegen.

LGTM with one minor

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5289	(style) use auto *

This revision is now accepted and ready to land.Sep 1 2020, 8:00 AM

Closed by commit rG15e880a04fcf: [DAGCombiner] Fold an AND of a masked load into a zext_masked_load (authored by samtebbs). · Explain WhySep 1 2020, 9:02 AM

This revision was automatically updated to reflect the committed changes.

samtebbs added a commit: rG15e880a04fcf: [DAGCombiner] Fold an AND of a masked load into a zext_masked_load.

Diff 289165

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,277 Lines • ▼ Show 20 Lines	if (ISD::isBuildVectorAllZeros(N1.getNode()))
return DAG.getConstant(APInt::getNullValue(N1.getScalarValueSizeInBits()),		return DAG.getConstant(APInt::getNullValue(N1.getScalarValueSizeInBits()),
SDLoc(N), N1.getValueType());		SDLoc(N), N1.getValueType());

// fold (and x, -1) -> x, vector edition		// fold (and x, -1) -> x, vector edition
if (ISD::isBuildVectorAllOnes(N0.getNode()))		if (ISD::isBuildVectorAllOnes(N0.getNode()))
return N1;		return N1;
if (ISD::isBuildVectorAllOnes(N1.getNode()))		if (ISD::isBuildVectorAllOnes(N1.getNode()))
return N0;		return N0;

		// fold (and (masked_load) (build_vec (x, ...))) to zext_masked_load
		MaskedLoadSDNode *MLoad = dyn_cast<MaskedLoadSDNode>(N0);
		BuildVectorSDNode *BVec = dyn_cast<BuildVectorSDNode>(N1);
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) use auto * RKSimon: (style) use auto *
		if (MLoad && BVec && MLoad->getExtensionType() == ISD::EXTLOAD &&
		N0.hasOneUse() && N1.hasOneUse()) {
		RKSimonUnsubmitted Done Reply Inline Actions Move the OneUse evaluations to the end of the conditions as they're the most expensive to check. if (MLoad && BVec && MLoad->getExtensionType() == ISD::EXTLOAD && N0.hasOneUse() && N1.hasOneUse()) RKSimon: Move the OneUse evaluations to the end of the conditions as they're the most expensive to check.
		EVT LoadVT = MLoad->getMemoryVT();
		EVT ExtVT = VT;
		if (TLI.isLoadExtLegal(ISD::ZEXTLOAD, ExtVT, LoadVT)) {
		// For this AND to be a zero extension of the masked load the elements
		// of the BuildVec must mask the bottom bits of the extended element
		// type
		if (ConstantSDNode *Splat = BVec->getConstantSplatNode()) {
		TypeSize ElementSize =
		LoadVT.getVectorElementType().getScalarSizeInBits();
		if (Splat->getAPIntValue().isMask((uint64_t)ElementSize)) {
		return DAG.getMaskedLoad(
		dmgreenUnsubmitted Done Reply Inline Actions Can this make use of the BuildVectorSDNode::getConstantSplatNode or isBuildVectorOfConstantSDNodes or something like it? dmgreen: Can this make use of the BuildVectorSDNode::getConstantSplatNode or…
		samtebbsAuthorUnsubmitted Done Reply Inline Actions Using that is much better, thanks. samtebbs: Using that is much better, thanks.
		RKSimonUnsubmitted Done Reply Inline Actions Sorry I'm away from my devmachine atm - I can't remember the APInt method exactly but I think you can do something like: Splat->getAPIntValue()->isMask((uint64_t)ElementSize) RKSimon: Sorry I'm away from my devmachine atm - I can't remember the APInt method exactly but I think…
		samtebbsAuthorUnsubmitted Done Reply Inline Actions Thanks, that's much cleaner. samtebbs: Thanks, that's much cleaner.
		ExtVT, SDLoc(N), MLoad->getChain(), MLoad->getBasePtr(),
		MLoad->getOffset(), MLoad->getMask(), MLoad->getPassThru(),
		LoadVT, MLoad->getMemOperand(), MLoad->getAddressingMode(),
		ISD::ZEXTLOAD, MLoad->isExpandingLoad());
		}
		}
		}
		}
}		}

// fold (and c1, c2) -> c1&c2		// fold (and c1, c2) -> c1&c2
ConstantSDNode *N1C = isConstOrConstSplat(N1);		ConstantSDNode *N1C = isConstOrConstSplat(N1);
if (SDValue C = DAG.FoldConstantArithmetic(ISD::AND, SDLoc(N), VT, {N0, N1}))		if (SDValue C = DAG.FoldConstantArithmetic(ISD::AND, SDLoc(N), VT, {N0, N1}))
return C;		return C;

// canonicalize constant to RHS		// canonicalize constant to RHS
▲ Show 20 Lines • Show All 16,953 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp,+fp64 -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp,+fp64 -verify-machineinstrs -o - %s \| FileCheck %s
				dmgreenUnsubmitted Done Reply Inline Actions It can be good to show before and after in the tests, to make the differences clearer. dmgreen: It can be good to show before and after in the tests, to make the differences clearer.
				samtebbsAuthorUnsubmitted Done Reply Inline Actions I've added extra checks to show what was generated before. samtebbs: I've added extra checks to show what was generated before.
				samtebbsAuthorUnsubmitted Done Reply Inline Actions It turns out I misunderstood. Here is the difference between codegen with and without this patch. test-diff881 KBDownload samtebbs: It turns out I misunderstood. Here is the difference between codegen with and without this…
				samtebbsAuthorUnsubmitted Done Reply Inline Actions diff --git a/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll b/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll index 5db6637ca81..9696827d846 100644 --- a/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll +++ b/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll @@ -7,10 +7,8 @@ define arm_aapcs_vfpcc <4 x float> @foo_v4i16(<4 x i16>* nocapture readonly %pSr ; CHECK-NEXT: vmovlb.s16 q0, q0 ; CHECK-NEXT: vpt.s32 lt, q0, zr ; CHECK-NEXT: vldrht.u32 q0, [r0] -; CHECK-NEXT: vmovlb.u16 q0, q0 ; CHECK-NEXT: vcvt.f32.u32 q0, q0 ; CHECK-NEXT: bx lr -; CHECK-OLD-NEXT: vmovlb.u16 q0, q0 entry: %active.lane.mask = icmp slt <4 x i16> %a, zeroinitializer %wide.masked.load = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %pSrc, i32 2, <4 x i1> %active.lane.mask, <4 x i16> undef) @@ -24,10 +22,8 @@ define arm_aapcs_vfpcc <8 x half> @foo_v8i8(<8 x i8>* nocapture readonly %pSrc, ; CHECK-NEXT: vmovlb.s8 q0, q0 ; CHECK-NEXT: vpt.s16 lt, q0, zr ; CHECK-NEXT: vldrbt.u16 q0, [r0] -; CHECK-NEXT: vmovlb.u8 q0, q0 ; CHECK-NEXT: vcvt.f16.u16 q0, q0 ; CHECK-NEXT: bx lr -; CHECK-OLD-NEXT: vmovlb.u8 q0, q0 entry: %active.lane.mask = icmp slt <8 x i8> %a, zeroinitializer %wide.masked.load = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* %pSrc, i32 1, <8 x i1> %active.lane.mask, <8 x i8> undef) @@ -39,15 +35,11 @@ define arm_aapcs_vfpcc <4 x float> @foo_v4i8(<4 x i8>* nocapture readonly %pSrc, ; CHECK-LABEL: foo_v4i8: ; CHECK: @ %bb.0: @ %entry ; CHECK-NEXT: vmovlb.s8 q0, q0 -; CHECK-NEXT: vmov.i32 q1, #0xff ; CHECK-NEXT: vmovlb.s16 q0, q0 ; CHECK-NEXT: vpt.s32 lt, q0, zr ; CHECK-NEXT: vldrbt.u32 q0, [r0] -; CHECK-NEXT: vand q0, q0, q1 ; CHECK-NEXT: vcvt.f32.u32 q0, q0 ; CHECK-NEXT: bx lr -; CHECK-OLD-NEXT: vmov.i32 q1, #0xff -; CHECK-OLD-NEXT: vand q0, q0, q1 entry: %active.lane.mask = icmp slt <4 x i8> %a, zeroinitializer %wide.masked.load = call <4 x i8> @llvm.masked.load.v4i8.p0v4i8(<4 x i8>* %pSrc, i32 1, <4 x i1> %active.lane.mask, <4 x i8> undef) samtebbs: ``` diff --git a/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll…
				RKSimonUnsubmitted Not Done Reply Inline Actions Can you commit this test with trunk's current codegen, then rebase this patch so it shows the delta. RKSimon: Can you commit this test with trunk's current codegen, then rebase this patch so it shows the…
				samtebbsAuthorUnsubmitted Done Reply Inline Actions Sure, I'll do so now. samtebbs: Sure, I'll do so now.
				samtebbsAuthorUnsubmitted Done Reply Inline Actions The CHECK-OLD-NEXT lines in there can be ignored. They snuck in to the diff somehow. samtebbs: The CHECK-OLD-NEXT lines in there can be ignored. They snuck in to the diff somehow.

	define arm_aapcs_vfpcc <4 x float> @foo_v4i16(<4 x i16>* nocapture readonly %pSrc, <4 x i16> %a) {			define arm_aapcs_vfpcc <4 x float> @foo_v4i16(<4 x i16>* nocapture readonly %pSrc, <4 x i16> %a) {
	; CHECK-LABEL: foo_v4i16:			; CHECK-LABEL: foo_v4i16:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vmovlb.s16 q0, q0			; CHECK-NEXT: vmovlb.s16 q0, q0
	; CHECK-NEXT: vpt.s32 lt, q0, zr			; CHECK-NEXT: vpt.s32 lt, q0, zr
	; CHECK-NEXT: vldrht.u32 q0, [r0]			; CHECK-NEXT: vldrht.u32 q0, [r0]
	; CHECK-NEXT: vmovlb.u16 q0, q0
	; CHECK-NEXT: vcvt.f32.u32 q0, q0			; CHECK-NEXT: vcvt.f32.u32 q0, q0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%active.lane.mask = icmp slt <4 x i16> %a, zeroinitializer			%active.lane.mask = icmp slt <4 x i16> %a, zeroinitializer
	%wide.masked.load = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %pSrc, i32 2, <4 x i1> %active.lane.mask, <4 x i16> undef)			%wide.masked.load = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %pSrc, i32 2, <4 x i1> %active.lane.mask, <4 x i16> undef)
	%0 = uitofp <4 x i16> %wide.masked.load to <4 x float>			%0 = uitofp <4 x i16> %wide.masked.load to <4 x float>
	ret <4 x float> %0			ret <4 x float> %0
	}			}

	define arm_aapcs_vfpcc <8 x half> @foo_v8i8(<8 x i8>* nocapture readonly %pSrc, i32 %blockSize, <8 x i8> %a) {			define arm_aapcs_vfpcc <8 x half> @foo_v8i8(<8 x i8>* nocapture readonly %pSrc, i32 %blockSize, <8 x i8> %a) {
	; CHECK-LABEL: foo_v8i8:			; CHECK-LABEL: foo_v8i8:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vmovlb.s8 q0, q0			; CHECK-NEXT: vmovlb.s8 q0, q0
	; CHECK-NEXT: vpt.s16 lt, q0, zr			; CHECK-NEXT: vpt.s16 lt, q0, zr
	; CHECK-NEXT: vldrbt.u16 q0, [r0]			; CHECK-NEXT: vldrbt.u16 q0, [r0]
	; CHECK-NEXT: vmovlb.u8 q0, q0
	; CHECK-NEXT: vcvt.f16.u16 q0, q0			; CHECK-NEXT: vcvt.f16.u16 q0, q0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%active.lane.mask = icmp slt <8 x i8> %a, zeroinitializer			%active.lane.mask = icmp slt <8 x i8> %a, zeroinitializer
	%wide.masked.load = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* %pSrc, i32 1, <8 x i1> %active.lane.mask, <8 x i8> undef)			%wide.masked.load = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* %pSrc, i32 1, <8 x i1> %active.lane.mask, <8 x i8> undef)
	%0 = uitofp <8 x i8> %wide.masked.load to <8 x half>			%0 = uitofp <8 x i8> %wide.masked.load to <8 x half>
	ret <8 x half> %0			ret <8 x half> %0
	}			}

	define arm_aapcs_vfpcc <4 x float> @foo_v4i8(<4 x i8>* nocapture readonly %pSrc, i32 %blockSize, <4 x i8> %a) {			define arm_aapcs_vfpcc <4 x float> @foo_v4i8(<4 x i8>* nocapture readonly %pSrc, i32 %blockSize, <4 x i8> %a) {
	; CHECK-LABEL: foo_v4i8:			; CHECK-LABEL: foo_v4i8:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: vmovlb.s8 q0, q0			; CHECK-NEXT: vmovlb.s8 q0, q0
	; CHECK-NEXT: vmov.i32 q1, #0xff
	; CHECK-NEXT: vmovlb.s16 q0, q0			; CHECK-NEXT: vmovlb.s16 q0, q0
	; CHECK-NEXT: vpt.s32 lt, q0, zr			; CHECK-NEXT: vpt.s32 lt, q0, zr
	; CHECK-NEXT: vldrbt.u32 q0, [r0]			; CHECK-NEXT: vldrbt.u32 q0, [r0]
	; CHECK-NEXT: vand q0, q0, q1
	; CHECK-NEXT: vcvt.f32.u32 q0, q0			; CHECK-NEXT: vcvt.f32.u32 q0, q0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	entry:			entry:
	%active.lane.mask = icmp slt <4 x i8> %a, zeroinitializer			%active.lane.mask = icmp slt <4 x i8> %a, zeroinitializer
	%wide.masked.load = call <4 x i8> @llvm.masked.load.v4i8.p0v4i8(<4 x i8>* %pSrc, i32 1, <4 x i1> %active.lane.mask, <4 x i8> undef)			%wide.masked.load = call <4 x i8> @llvm.masked.load.v4i8.p0v4i8(<4 x i8>* %pSrc, i32 1, <4 x i1> %active.lane.mask, <4 x i8> undef)
	%0 = uitofp <4 x i8> %wide.masked.load to <4 x float>			%0 = uitofp <4 x i8> %wide.masked.load to <4 x float>
	ret <4 x float> %0			ret <4 x float> %0
	}			}
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Fold an AND of a masked load into a zext_masked_load
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 289165

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Fold an AND of a masked load into a zext_masked_loadClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 289165

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll

[DAGCombiner] Fold an AND of a masked load into a zext_masked_load
ClosedPublic