This patch folds an AND of a masked load and build vector into a zero extended masked load.
Details
Diff Detail
Event Timeline
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
5299 | Can this make use of the BuildVectorSDNode::getConstantSplatNode or isBuildVectorOfConstantSDNodes or something like it? | |
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll | ||
2 | It can be good to show before and after in the tests, to make the differences clearer. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
5288 | Move the OneUse evaluations to the end of the conditions as they're the most expensive to check. if (MLoad && BVec && MLoad->getExtensionType() == ISD::EXTLOAD && N0.hasOneUse() && N1.hasOneUse()) | |
5299 | Sorry I'm away from my devmachine atm - I can't remember the APInt method exactly but I think you can do something like: Splat->getAPIntValue()->isMask((uint64_t)ElementSize) |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
5299 | Thanks, that's much cleaner. |
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll | ||
---|---|---|
2 | It turns out I misunderstood. Here is the difference between codegen with and without this patch. |
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll | ||
---|---|---|
2 | diff --git a/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll b/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll index 5db6637ca81..9696827d846 100644 --- a/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll +++ b/llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll @@ -7,10 +7,8 @@ define arm_aapcs_vfpcc <4 x float> @foo_v4i16(<4 x i16>* nocapture readonly %pSr ; CHECK-NEXT: vmovlb.s16 q0, q0 ; CHECK-NEXT: vpt.s32 lt, q0, zr ; CHECK-NEXT: vldrht.u32 q0, [r0] -; CHECK-NEXT: vmovlb.u16 q0, q0 ; CHECK-NEXT: vcvt.f32.u32 q0, q0 ; CHECK-NEXT: bx lr -; CHECK-OLD-NEXT: vmovlb.u16 q0, q0 entry: %active.lane.mask = icmp slt <4 x i16> %a, zeroinitializer %wide.masked.load = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* %pSrc, i32 2, <4 x i1> %active.lane.mask, <4 x i16> undef) @@ -24,10 +22,8 @@ define arm_aapcs_vfpcc <8 x half> @foo_v8i8(<8 x i8>* nocapture readonly %pSrc, ; CHECK-NEXT: vmovlb.s8 q0, q0 ; CHECK-NEXT: vpt.s16 lt, q0, zr ; CHECK-NEXT: vldrbt.u16 q0, [r0] -; CHECK-NEXT: vmovlb.u8 q0, q0 ; CHECK-NEXT: vcvt.f16.u16 q0, q0 ; CHECK-NEXT: bx lr -; CHECK-OLD-NEXT: vmovlb.u8 q0, q0 entry: %active.lane.mask = icmp slt <8 x i8> %a, zeroinitializer %wide.masked.load = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* %pSrc, i32 1, <8 x i1> %active.lane.mask, <8 x i8> undef) @@ -39,15 +35,11 @@ define arm_aapcs_vfpcc <4 x float> @foo_v4i8(<4 x i8>* nocapture readonly %pSrc, ; CHECK-LABEL: foo_v4i8: ; CHECK: @ %bb.0: @ %entry ; CHECK-NEXT: vmovlb.s8 q0, q0 -; CHECK-NEXT: vmov.i32 q1, #0xff ; CHECK-NEXT: vmovlb.s16 q0, q0 ; CHECK-NEXT: vpt.s32 lt, q0, zr ; CHECK-NEXT: vldrbt.u32 q0, [r0] -; CHECK-NEXT: vand q0, q0, q1 ; CHECK-NEXT: vcvt.f32.u32 q0, q0 ; CHECK-NEXT: bx lr -; CHECK-OLD-NEXT: vmov.i32 q1, #0xff -; CHECK-OLD-NEXT: vand q0, q0, q1 entry: %active.lane.mask = icmp slt <4 x i8> %a, zeroinitializer %wide.masked.load = call <4 x i8> @llvm.masked.load.v4i8.p0v4i8(<4 x i8>* %pSrc, i32 1, <4 x i1> %active.lane.mask, <4 x i8> undef) |
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll | ||
---|---|---|
2 | Can you commit this test with trunk's current codegen, then rebase this patch so it shows the delta. |
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll | ||
---|---|---|
2 | The CHECK-OLD-NEXT lines in there can be ignored. They snuck in to the diff somehow. |
llvm/test/CodeGen/Thumb2/mve-zext-masked-load.ll | ||
---|---|---|
2 | Sure, I'll do so now. |
LGTM with one minor
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
5286 | (style) use auto * |
(style) use auto *