# [AArch64][SVE] Invert VSelect operand order and condition for predicated arithmetic operationsClosedPublicActions

Authored by MattDevereau on Feb 10 2022, 3:37 AM.

# Details

Reviewers
 peterwaller-arm paulwalker-arm bsmith DavidTruby david-arm efriedma kmclaughlin
Commits
rG2f2dcb4fb134: [AArch64][SVE] Invert VSelect operand order and condition for predicated…
Summary

[AArch64][SVE] Invert VSelect operand order and condition for predicated arithmetic operations

```(vselect (setcc ( condcode) (_) (_)) (a)          (op (a) (b)))
=> (vselect (setcc (!condcode) (_) (_)) (op (a) (b)) (a))```

As a follow up to D117689, invert the operand order and condition
in order to fold vselects into predicated instructions.

# Diff Detail

### Event Timeline

MattDevereau created this revision.Feb 10 2022, 3:37 AM
MattDevereau requested review of this revision.Feb 10 2022, 3:37 AM
Herald added a project: Restricted Project. Feb 10 2022, 3:37 AM

Adding @kmclaughlin as she wrote the original sve-fp-reciprocal tests.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17156–17157

I believe this transform would loop given something like:

```m = fmul a, b
p = setcc <cond> m, 0
vselect p, m, m```
17160

The comment above describing this transform isn't accurate as it doesn't reflect these restrictions around setcc.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17156–17157

isnt vselect p, m, m a nop?

i've created a test for the example which doesn't loop

```define <vscale x 4 x float> @fcmp_select_f32_double_op(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
; CHECK-LABEL: fcmp_select_f32_double_op:
; CHECK:       // %bb.0:
; CHECK-NEXT:    fmul z0.s, z0.s, z1.s
; CHECK-NEXT:    ret
%m = fmul <vscale x 4 x float> %a, %b
%fcmp = fcmp oeq <vscale x 4 x float> %m, zeroinitializer
%sel = select <vscale x 4 x i1> %fcmp, <vscale x 4 x float> %m, <vscale x 4 x float> %m
ret <vscale x 4 x float> %sel
}```
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17156–17157

It likely will get removed as redundant yes, I just worry about things like this that could end up getting through in esoteric cases.

MattDevereau marked 2 inline comments as done.Feb 10 2022, 6:34 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17156–17157

added test fcmp_select_f32_double_op to llvm/test/CodeGen/AArch64/sve-select.ll and added condition if (SetCCOp0 == NOp2) return None; after condition SetCCOp0 != NOp1

17160

updated comment to
(vselect (setcc (a) (0)) (a) (op (a) (b)))
=> (vselect (not setcc (a) (0)) (op (a) (b)) (a))

Do you want to perform this combine for all vector types? You want the combine for an SVE specific reason and thus I'm wondering if it's better to restrict the combine to scalable vectors? Also, do use counts need to play a role here? I'm thinking that you might not want to flip the condition if it means generating additional compare instructions.

paulwalker-arm added a comment.EditedFeb 11 2022, 3:04 AM

Just a suggestion but another option regarding my scalable vectors only comment is that for SVE we lower all the floating point operations to predicated nodes so you could have a post lowering combine that looks for FADD_PRED rather than FADD. Not sure if there's a huge benefit to this but given you're trying to produce something more isel friendly, having the combine as close to isel as possible is perhaps beneficial. I guess it just depends on if the extra predicate used by FADD_PRED makes the combine awkward/ugly.

MattDevereau marked 2 inline comments as done.

Added constraints for scalable vector types and one setcc use only

@paulwalker-arm I replaced ISD::FMUL etc with AArch64::FMUL_PRED however it failed to do the combine afterwards

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17155–17156

I don't think this pattern should be sensitive to the contents of the setcc, so we don't need to be looking for a 0 nor an 0.

17168–17169

Per above, I don't think this should read operands of the setcc, except for inverting the condcode at the end.

The combine condition wants to be something which matches when op appears on the right, and op's left operand is equal to the left operand of the vselect. This naturally prevents an infinite loop because it's not possible for the VSelect.LHS == VSelect.RHS.LHS to be true before and after the swap; and it's not true of VSelect.LHS == VSelect.RHS. (For these things to be true there would have to be a cycle of values, which is not allowed in the IR DAG).

I might find this a bit easier to read with LHS/RHS naming convention, since the vselect has an op0 which is the condcode, so its Op1 is the LHS of the vselect, whereas the OpOp0 is the LHS for the op.

So my suggestions for some clearer naming, if you need those things:
NOp1 => SelectA
NOp2 => SelectB
OpOp0 => OpLHS

Then the condition to perform the combine is SelectA == OpLHS, if SelectB.Opcode could profit from the transformation.

llvm/test/CodeGen/AArch64/sve-select.ll
654

Extraneous?

Removed SetCCOp0 == NOp2 and SetCCOp0 != NOp1 exit conditions
Added SelectA != SelectB.getOperand(0) exit condition

Functionally I think it's looking reasonable to me. A few more stylistic nits.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17155–17156

Just checking if you saw the suggestion above this comment, which adds rationale and makes the select pattern comment a little easier to read.

17157

SDValue has Optional-like semantics built in: an SDValue() evaluates to false, so the optional is unnecessary here (I didn't see any other cases of Optional being used as a return argument like this in this file).

17164–17165

This if statement has a mix of conditions, referring to different things. It would be slightly better if it were grouped so that the setcc ones are next to each other at least. Better still, I might hoist the scalable query up to the top:

```auto NTy = N->getValueType(0);
if (!NTy.isScalableVector())
return None;```

My concern is that the condition is hiding in there, someone scanning their eyes vertically at the condition might see 'setcc, setcc' on adjacent lines, and think that all of the conditions relate to setcc, where they do not.

17182

SetCCOp0 is named here but the SetCC.getOperand(1) is not assigned a variable, so I'd drop the variable in this case because it is both single use and doesn't add any extra information.

17197–17198

Name clarity: inverting the vselect sounds like (not vselect). The object being inverted is the setcc condition code. Suggestion: A better name might be trySwapVSelectOperands?

MattDevereau edited the summary of this revision. (Show Details)
Herald added a subscriber: ctetreau. Feb 16 2022, 2:32 AM
MattDevereau edited the summary of this revision. (Show Details)Feb 16 2022, 2:33 AM
MattDevereau edited the summary of this revision. (Show Details)
peterwaller-arm accepted this revision.Feb 16 2022, 3:02 AM