This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Fold an AND of a masked load into a zext_masked_load
ClosedPublic

Authored by samtebbs on Aug 28 2020, 8:25 AM.

Details

Summary

This patch folds an AND of a masked load and build vector into a zero extended masked load.

Diff Detail

Event Timeline

samtebbs created this revision.Aug 28 2020, 8:25 AM
samtebbs requested review of this revision.Aug 28 2020, 8:25 AM
dmgreen added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5302

Can this make use of the BuildVectorSDNode::getConstantSplatNode or isBuildVectorOfConstantSDNodes or something like it?

llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
3

It can be good to show before and after in the tests, to make the differences clearer.

samtebbs updated this revision to Diff 289130.Sep 1 2020, 4:13 AM

Improved the BuildVec processing and added old checks to test.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5302

Using that is much better, thanks.

llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
3

I've added extra checks to show what was generated before.

RKSimon added inline comments.Sep 1 2020, 4:33 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5291

Move the OneUse evaluations to the end of the conditions as they're the most expensive to check.

if (MLoad && BVec && MLoad->getExtensionType() == ISD::EXTLOAD && N0.hasOneUse() && N1.hasOneUse())
5302

Sorry I'm away from my devmachine atm - I can't remember the APInt method exactly but I think you can do something like:

Splat->getAPIntValue()->isMask((uint64_t)ElementSize)
samtebbs updated this revision to Diff 289147.Sep 1 2020, 6:15 AM

Use isMask and move hasOneUse checks

samtebbs marked 4 inline comments as done.Sep 1 2020, 6:16 AM
samtebbs added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5302

Thanks, that's much cleaner.

samtebbs marked an inline comment as done.Sep 1 2020, 6:52 AM
samtebbs added inline comments.
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
3

It turns out I misunderstood. Here is the difference between codegen with and without this patch.

samtebbs added inline comments.Sep 1 2020, 6:55 AM
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
3
diff --git a/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll b/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
index 5db6637ca81..9696827d846 100644
--- a/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
@@ -7,10 +7,8 @@ define arm_aapcs_vfpcc <4 x float> @foo_v4i16(<4 x i16>* nocapture readonly %pSr
 ; CHECK-NEXT:    vmovlb.s16 q0, q0
 ; CHECK-NEXT:    vpt.s32 lt, q0, zr
 ; CHECK-NEXT:    vldrht.u32 q0, [r0]
-; CHECK-NEXT:    vmovlb.u16 q0, q0
 ; CHECK-NEXT:    vcvt.f32.u32 q0, q0
 ; CHECK-NEXT:    bx lr
-; CHECK-OLD-NEXT:    vmovlb.u16 q0, q0
 entry:
   %active.lane.mask = icmp slt <4 x i16> %a, zeroinitializer
   %wide.masked.load = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %pSrc, i32 2, <4 x i1> %active.lane.mask, <4 x i16> undef)
@@ -24,10 +22,8 @@ define arm_aapcs_vfpcc <8 x half> @foo_v8i8(<8 x i8>* nocapture readonly %pSrc,
 ; CHECK-NEXT:    vmovlb.s8 q0, q0
 ; CHECK-NEXT:    vpt.s16 lt, q0, zr
 ; CHECK-NEXT:    vldrbt.u16 q0, [r0]
-; CHECK-NEXT:    vmovlb.u8 q0, q0
 ; CHECK-NEXT:    vcvt.f16.u16 q0, q0
 ; CHECK-NEXT:    bx lr
-; CHECK-OLD-NEXT:    vmovlb.u8 q0, q0
 entry:
   %active.lane.mask = icmp slt <8 x i8> %a, zeroinitializer
   %wide.masked.load = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* %pSrc, i32 1, <8 x i1> %active.lane.mask, <8 x i8> undef)
@@ -39,15 +35,11 @@ define arm_aapcs_vfpcc <4 x float> @foo_v4i8(<4 x i8>* nocapture readonly %pSrc,
 ; CHECK-LABEL: foo_v4i8:
 ; CHECK:       @ %bb.0: @ %entry
 ; CHECK-NEXT:    vmovlb.s8 q0, q0
-; CHECK-NEXT:    vmov.i32 q1, #0xff
 ; CHECK-NEXT:    vmovlb.s16 q0, q0
 ; CHECK-NEXT:    vpt.s32 lt, q0, zr
 ; CHECK-NEXT:    vldrbt.u32 q0, [r0]
-; CHECK-NEXT:    vand q0, q0, q1
 ; CHECK-NEXT:    vcvt.f32.u32 q0, q0
 ; CHECK-NEXT:    bx lr
-; CHECK-OLD-NEXT:    vmov.i32 q1, #0xff
-; CHECK-OLD-NEXT:    vand q0, q0, q1
 entry:
   %active.lane.mask = icmp slt <4 x i8> %a, zeroinitializer
   %wide.masked.load = call <4 x i8> @llvm.masked.load.v4i8.p0v4i8(<4 x i8>* %pSrc, i32 1, <4 x i1> %active.lane.mask, <4 x i8> undef)
RKSimon added inline comments.Sep 1 2020, 6:58 AM
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
3

Can you commit this test with trunk's current codegen, then rebase this patch so it shows the delta.

samtebbs added inline comments.Sep 1 2020, 7:17 AM
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
3

The CHECK-OLD-NEXT lines in there can be ignored. They snuck in to the diff somehow.

samtebbs added inline comments.Sep 1 2020, 7:23 AM
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll
3

Sure, I'll do so now.

samtebbs updated this revision to Diff 289165.Sep 1 2020, 7:49 AM

Show difference to previous codegen.

RKSimon accepted this revision.Sep 1 2020, 8:00 AM

LGTM with one minor

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5289

(style) use auto *

This revision is now accepted and ready to land.Sep 1 2020, 8:00 AM