Page MenuHomePhabricator

[AArch64][SVE] Invert VSelect operand order and condition for predicated arithmetic operations

Authored by MattDevereau on Feb 10 2022, 3:37 AM.



[AArch64][SVE] Invert VSelect operand order and condition for predicated arithmetic operations

(vselect (setcc ( condcode) (_) (_)) (a)          (op (a) (b)))
=> (vselect (setcc (!condcode) (_) (_)) (op (a) (b)) (a))

As a follow up to D117689, invert the operand order and condition
in order to fold vselects into predicated instructions.

Diff Detail

Event Timeline

MattDevereau created this revision.Feb 10 2022, 3:37 AM
MattDevereau requested review of this revision.Feb 10 2022, 3:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2022, 3:37 AM
david-arm added a subscriber: kmclaughlin.

Adding @kmclaughlin as she wrote the original sve-fp-reciprocal tests.

bsmith added inline comments.Feb 10 2022, 3:59 AM

I believe this transform would loop given something like:

m = fmul a, b
p = setcc <cond> m, 0
vselect p, m, m

The comment above describing this transform isn't accurate as it doesn't reflect these restrictions around setcc.

MattDevereau added inline comments.Feb 10 2022, 4:11 AM

isnt vselect p, m, m a nop?

i've created a test for the example which doesn't loop

define <vscale x 4 x float> @fcmp_select_f32_double_op(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
; CHECK-LABEL: fcmp_select_f32_double_op:
; CHECK:       // %bb.0:
; CHECK-NEXT:    fmul z0.s, z0.s, z1.s
; CHECK-NEXT:    ret
  %m = fmul <vscale x 4 x float> %a, %b
  %fcmp = fcmp oeq <vscale x 4 x float> %m, zeroinitializer
  %sel = select <vscale x 4 x i1> %fcmp, <vscale x 4 x float> %m, <vscale x 4 x float> %m
  ret <vscale x 4 x float> %sel
bsmith added inline comments.Feb 10 2022, 4:16 AM

It likely will get removed as redundant yes, I just worry about things like this that could end up getting through in esoteric cases.

MattDevereau marked 2 inline comments as done.Feb 10 2022, 6:34 AM
MattDevereau added inline comments.

added test fcmp_select_f32_double_op to llvm/test/CodeGen/AArch64/sve-select.ll and added condition if (SetCCOp0 == NOp2) return None; after condition SetCCOp0 != NOp1


updated comment to
(vselect (setcc (a) (0)) (a) (op (a) (b)))
=> (vselect (not setcc (a) (0)) (op (a) (b)) (a))

Do you want to perform this combine for all vector types? You want the combine for an SVE specific reason and thus I'm wondering if it's better to restrict the combine to scalable vectors? Also, do use counts need to play a role here? I'm thinking that you might not want to flip the condition if it means generating additional compare instructions.

paulwalker-arm added a comment.EditedFeb 11 2022, 3:04 AM

Just a suggestion but another option regarding my scalable vectors only comment is that for SVE we lower all the floating point operations to predicated nodes so you could have a post lowering combine that looks for FADD_PRED rather than FADD. Not sure if there's a huge benefit to this but given you're trying to produce something more isel friendly, having the combine as close to isel as possible is perhaps beneficial. I guess it just depends on if the extra predicate used by FADD_PRED makes the combine awkward/ugly.

MattDevereau marked 2 inline comments as done.

Added constraints for scalable vector types and one setcc use only

@paulwalker-arm I replaced ISD::FMUL etc with AArch64::FMUL_PRED however it failed to do the combine afterwards


I don't think this pattern should be sensitive to the contents of the setcc, so we don't need to be looking for a 0 nor an 0.


Per above, I don't think this should read operands of the setcc, except for inverting the condcode at the end.

The combine condition wants to be something which matches when op appears on the right, and op's left operand is equal to the left operand of the vselect. This naturally prevents an infinite loop because it's not possible for the VSelect.LHS == VSelect.RHS.LHS to be true before and after the swap; and it's not true of VSelect.LHS == VSelect.RHS. (For these things to be true there would have to be a cycle of values, which is not allowed in the IR DAG).

I might find this a bit easier to read with LHS/RHS naming convention, since the vselect has an op0 which is the condcode, so its Op1 is the LHS of the vselect, whereas the OpOp0 is the LHS for the op.

So my suggestions for some clearer naming, if you need those things:
NOp1 => SelectA
NOp2 => SelectB
OpOp0 => OpLHS

Then the condition to perform the combine is SelectA == OpLHS, if SelectB.Opcode could profit from the transformation.



Removed SetCCOp0 == NOp2 and SetCCOp0 != NOp1 exit conditions
Added SelectA != SelectB.getOperand(0) exit condition

Functionally I think it's looking reasonable to me. A few more stylistic nits.


Just checking if you saw the suggestion above this comment, which adds rationale and makes the select pattern comment a little easier to read.


SDValue has Optional-like semantics built in: an SDValue() evaluates to false, so the optional is unnecessary here (I didn't see any other cases of Optional being used as a return argument like this in this file).


This if statement has a mix of conditions, referring to different things. It would be slightly better if it were grouped so that the setcc ones are next to each other at least. Better still, I might hoist the scalable query up to the top:

auto NTy = N->getValueType(0);
if (!NTy.isScalableVector())
  return None;

My concern is that the condition is hiding in there, someone scanning their eyes vertically at the condition might see 'setcc, setcc' on adjacent lines, and think that all of the conditions relate to setcc, where they do not.


SetCCOp0 is named here but the SetCC.getOperand(1) is not assigned a variable, so I'd drop the variable in this case because it is both single use and doesn't add any extra information.


Name clarity: inverting the vselect sounds like (not vselect). The object being inverted is the setcc condition code. Suggestion: A better name might be trySwapVSelectOperands?

MattDevereau edited the summary of this revision. (Show Details)
MattDevereau edited the summary of this revision. (Show Details)Feb 16 2022, 2:33 AM
MattDevereau edited the summary of this revision. (Show Details)
peterwaller-arm accepted this revision.Feb 16 2022, 3:02 AM
peterwaller-arm added inline comments.

Nit. This refers to attribute group #0 which is undefined.


Nit. Same again: This refers to attribute group #0 which is undefined.

This revision is now accepted and ready to land.Feb 16 2022, 3:02 AM
This revision was landed with ongoing or failed builds.Feb 17 2022, 8:01 AM
This revision was automatically updated to reflect the committed changes.