This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Fold predicate into compare
ClosedPublic

Authored by c-rhodes on Jan 6 2022, 8:12 AM.

Details

Summary

Codegen of added testcase before this patch:

ptrue   p0.s
cmpgt   p1.s, p0/z, z0.s, z1.s
cmpge   p2.s, p0/z, z2.s, z1.s
and     p0.b, p0/z, p1.b, p2.b
ret

Patterns originally authored by Will Lovett.

Diff Detail

Event Timeline

c-rhodes created this revision.Jan 6 2022, 8:12 AM
c-rhodes requested review of this revision.Jan 6 2022, 8:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2022, 8:12 AM

Nice improvement @c-rhodes! Just a couple of suggestions ...

llvm/lib/Target/AArch64/SVEInstrFormats.td
4644

I do wonder if it's also worth creating patterns for ands matching the first operand too, i.e.

def : Pat<(predvt (and (AArch64setcc_z (predvt (AArch64ptrue 31)), intvt:$Op2, intvt:$Op3, cc))),
          (cmp $Pg, $Op2, $Op3), predvt:$Pg>;

I'm thinking about the case where only one input to the and instruction is a setcc, i.e. see my comment about writing other test cases below.

llvm/test/CodeGen/AArch64/sve-intrinsics-int-compares.ll
967

Hi @c-rhodes, given your patterns also work for any general predicate for the first operand of and can you also add a testcase that demonstrates this? For example, something like:

define <vscale x 4 x i1> @predicated_icmp_unknown_lhs(<vscale x 4 x i1> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) {
  %icmp = icmp sle <vscale x 4 x i32> %b, %c
  %and = and <vscale x 4 x i1> %a, %icmp
  ret <vscale x 4 x i1> %and
}

I'd expect to see this code being produced:

; CHECK:       // %bb.0:
; CHECK-NEXT:    cmpge p0.s, p0/z, z2.s, z1.s
; CHECK-NEXT:    ret
c-rhodes updated this revision to Diff 398100.Jan 7 2022, 4:19 AM

Add test Dave suggested

c-rhodes marked an inline comment as done.Jan 7 2022, 4:20 AM
c-rhodes added inline comments.
llvm/lib/Target/AArch64/SVEInstrFormats.td
4644

I do wonder if it's also worth creating patterns for ands matching the first operand too, i.e.

def : Pat<(predvt (and (AArch64setcc_z (predvt (AArch64ptrue 31)), intvt:$Op2, intvt:$Op3, cc))),
          (cmp $Pg, $Op2, $Op3), predvt:$Pg>;

I'm thinking about the case where only one input to the and instruction is a setcc, i.e. see my comment about writing other test cases below.

I've added the testcase you suggested and the codegen is as expected without this pattern, so I think we're good unless I'm missing something?

c-rhodes updated this revision to Diff 398111.Jan 7 2022, 5:33 AM

Add further test for RHS

david-arm accepted this revision.Jan 7 2022, 5:34 AM

LGTM!. Thanks for adding the tests @c-rhodes. :)

This revision is now accepted and ready to land.Jan 7 2022, 5:34 AM
Matt added a subscriber: Matt.Jan 7 2022, 7:30 AM
This revision was automatically updated to reflect the committed changes.