Page MenuHomePhabricator

[AArch64][SVE] Improve code generation for VLS i1 masks
Needs ReviewPublic

Authored by DavidTruby on Oct 6 2021, 5:33 AM.

Details

Summary

This patch partially resolves an issue for VLS code generation
where a mask is generated from a smaller width integer comparison
than the instruction using the mask requires.

Instead of sign extending a p register by converting it to a z
register, extending that, and converting back, we instead just
do an unpack of the p register.

A separate issue causes the code generation to still be poor when
the mask generation would fit in a neon register, as we then use
a neon comparison operation and have to convert that to a p register.
This will be resolved in a separate patch.

Diff Detail

Event Timeline

DavidTruby created this revision.Oct 6 2021, 5:33 AM
DavidTruby requested review of this revision.Oct 6 2021, 5:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2021, 5:33 AM
peterwaller-arm added inline comments.Oct 6 2021, 6:24 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15737

nit. As many of the other combines have, please could you introduce a comment showing the form of the intended combine?

16429

nit. As many of the other combines have, please could you introduce a comment showing the form of the combine?

paulwalker-arm added inline comments.Oct 6 2021, 9:49 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16437

I believe this transform is sound for a normal SETCC but N is a SETCC_MERGE_ZERO and so consideration must be paid to the predicate to ensure the inactive lanes get zero'd.

The easiest way to make the combine safe is to check Pred is all active. The problem with doing that is it'll probably mean the combine doesn't fire for the case you care about.

Matt added a subscriber: Matt.Oct 6 2021, 10:45 AM

Added an extra condition to make the transformation sound.

In essence, we need to check that the predicate is the same
before and after the transform, modulo its size since we're
moving from a smaller to larger predicate.

DavidTruby marked an inline comment as done.Oct 12 2021, 5:40 AM
bsmith added inline comments.Oct 25 2021, 3:34 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
36

Rogue include?

16439

Is this assuming that Pred and OrigPred and ptrues? If they're not, won't this break?

Add additional check that predicates are ptrues

DavidTruby marked 2 inline comments as done.Oct 27 2021, 5:57 AM

Add negative tests for the case where the ptrue vl is not the same, or where
one of the predicates is not a ptrue at all.

llvm/test/CodeGen/AArch64/sve-punpklo-combine.ll
29

ptrue

Improve added ACLE intrinsic tests

Fix spelling mistake

Fix tests after spelling mistake (whoops...)

llvm/test/CodeGen/AArch64/sve-punpklo-combine.ll
18

Unused %dup? (I guess you replaced it with zeroinitializer?)

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15737

This comment doesn't match the code below. At the DAG level this is really sunpklo(sext(pred)) -> sext(extract_low_half(pred)).

Also, is this transform universally good? As in, if the later transformation does not occur, is punpklo always better than sunpklo?

15745

Can you be explicit here given we know we always want a sign extend?

16438

Perhaps I've confused myself with the number of getOperand() calls but this doesn't looks quite right:

LHS == ISD::SIGN_EXTEND
LHS->getOperand(0) == ISD::EXTRACT_SUBVECTOR
LHS->getOperand(0)->getOperand(0) == vector we're extracting the subvector from
LHS->getOperand(0)->getOperand(0)->getOperand(0) == ???? and how do you know LHS->getOperand(0)->getOperand(0) has an operand to get?

I can see the following code then ensures LHS->getOperand(0)->getOperand(0)->getOperand(0) is a PTRUE but before then depending on LHS->getOperand(0)->getOperand(0) the extra call to getOperand(0) might assert. Also, is it really the case that is doesn't matter how the PTRUE is processed before being passed to ISD::EXTRACT_SUBVECTOR?