This is an archive of the discontinued LLVM Phabricator instance.

[ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i.
ClosedPublic

Authored by simon_tatham on Nov 15 2019, 7:25 AM.

Details

Summary

If you're writing C code using the ACLE MVE intrinsics that passes the
result of a vcmp as input to a predicated intrinsic, e.g.

mve_pred16_t pred = vcmpeqq(v1, v2);
v_out = vaddq_m(v_inactive, v3, v4, pred);

then clang's codegen for the compare intrinsic will create calls to
@llvm.arm.mve.pred.v2i to convert the output of icmp into an
mve_pred16_t integer representation, and then the next intrinsic
will call @llvm.arm.mve.pred.i2v to convert it straight back again.
This will be visible in the generated code as a vmrs/vmsr pair
that move the predicate value pointlessly out of p0 and back into it again.

To prevent that, I've added InstCombine rules to remove round trips of
the form v2i(i2v(x)) and i2v(v2i(x)). Also I've taught InstCombine
about the known and demanded bits of those intrinsics. As a result,
you now get just the generated code you wanted:

vpt.u16 eq, q1, q2
vaddt.u16 q0, q3, q4

Diff Detail

Event Timeline

simon_tatham created this revision.Nov 15 2019, 7:25 AM

D'oh, moved mve-vpt-from-intrinsics.ll into here from D70297.

Harbormaster completed remote builds in B41032: Diff 229550.
dmgreen accepted this revision.Nov 18 2019, 1:59 AM

Very nice. LGTM

We could also think about adding constant propagation through these, when we already know the incoming value.

This revision is now accepted and ready to land.Nov 18 2019, 1:59 AM
This revision was automatically updated to reflect the committed changes.