When lowering the get.active.lane.mask intrinsic with a fixed-width
predicate vector result, we can actually make use of the SVE whilelo
instruction when SVE is enabled. We do this by carefully choosing
a sensible VT for the whilelo instruction, then promoting it to an
integer vector, i.e. nxv16i1 -> nx16i8. We can then extract a v16i8
subvector and truncate back to the original return type, i.e. v16i1.
This leads to a significant improvement in code quality.
Details
Diff Detail
Event Timeline
I'm not sure the testcases actually illustrate the cases we care about. Generally, I would expect the result of llvm.get.active.lane.mask() to be used in a select instruction, or a masked load, or something like that. And in that case, I'm not sure the way you're choosing the VT is appropriate; the instruction using the mask is probably not going to expect a 64-bit vector.
Hi @efriedma, I think these testcases are still useful by themselves because they are succint and make it easy to see how one IR instruction maps to assembly. At the moment if I add more complex test cases involving a select, for example, the code quality ends up being awful regardless of what promoted VT I choose. I think there is a still a codegen issue somewhere because I see loads of pointless lane moves whenever I add something like a select. So for now, I'd like to leave the tests as they are.
However, I do take your point about trying to second guess how the masks are going to be used, and perhaps I can make the choice of promoted VT simpler for now, and leave the xtn instructions in.
- Removed code to optimise VT for NEON, since we don't know how the result is actually going to be used.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
15095 | It looks like it is worth adding an assert that SVE is available too. |
nit: avoid indentation by doing if (!VT.isFixedLengthVector()) return SDValue(); ?