Download Raw Diff

Details

Reviewers

peterwaller-arm
paulwalker-arm
bsmith
DavidTruby
david-arm
efriedma
kmclaughlin

Commits

rG2f2dcb4fb134: [AArch64][SVE] Invert VSelect operand order and condition for predicated…

Summary

[AArch64][SVE] Invert VSelect operand order and condition for predicated arithmetic operations

(vselect (setcc ( condcode) (_) (_)) (a)          (op (a) (b)))
=> (vselect (setcc (!condcode) (_) (_)) (op (a) (b)) (a))

As a follow up to D117689, invert the operand order and condition
in order to fold vselects into predicated instructions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MattDevereau created this revision.Feb 10 2022, 3:37 AM

Herald added a reviewer: efriedma. · View Herald TranscriptFeb 10 2022, 3:37 AM

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

MattDevereau requested review of this revision.Feb 10 2022, 3:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2022, 3:37 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Adding @kmclaughlin as she wrote the original sve-fp-reciprocal tests.

bsmith added inline comments.Feb 10 2022, 3:59 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17156–17157	I believe this transform would loop given something like: m = fmul a, b p = setcc <cond> m, 0 vselect p, m, m
17160	The comment above describing this transform isn't accurate as it doesn't reflect these restrictions around setcc.

MattDevereau added inline comments.Feb 10 2022, 4:11 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

17156–17157

isnt vselect p, m, m a nop?

i've created a test for the example which doesn't loop

define <vscale x 4 x float> @fcmp_select_f32_double_op(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
; CHECK-LABEL: fcmp_select_f32_double_op:
; CHECK:       // %bb.0:
; CHECK-NEXT:    fmul z0.s, z0.s, z1.s
; CHECK-NEXT:    ret
  %m = fmul <vscale x 4 x float> %a, %b
  %fcmp = fcmp oeq <vscale x 4 x float> %m, zeroinitializer
  %sel = select <vscale x 4 x i1> %fcmp, <vscale x 4 x float> %m, <vscale x 4 x float> %m
  ret <vscale x 4 x float> %sel
}

bsmith added inline comments.Feb 10 2022, 4:16 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17156–17157	It likely will get removed as redundant yes, I just worry about things like this that could end up getting through in esoteric cases.

Harbormaster completed remote builds in B148698: Diff 407458.Feb 10 2022, 4:31 AM

MattDevereau updated this revision to Diff 407514.Feb 10 2022, 6:29 AM

MattDevereau marked 2 inline comments as done.Feb 10 2022, 6:34 AM

MattDevereau added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17156–17157	added test `fcmp_select_f32_double_op` to `llvm/test/CodeGen/AArch64/sve-select.ll` and added condition `if (SetCCOp0 == NOp2) return None;` after condition `SetCCOp0 != NOp1`
17160	updated comment to (vselect (setcc (a) (0)) (a) (op (a) (b))) => (vselect (not setcc (a) (0)) (op (a) (b)) (a))

Harbormaster completed remote builds in B148733: Diff 407514.Feb 10 2022, 7:11 AM

Do you want to perform this combine for all vector types? You want the combine for an SVE specific reason and thus I'm wondering if it's better to restrict the combine to scalable vectors? Also, do use counts need to play a role here? I'm thinking that you might not want to flip the condition if it means generating additional compare instructions.

Just a suggestion but another option regarding my scalable vectors only comment is that for SVE we lower all the floating point operations to predicated nodes so you could have a post lowering combine that looks for FADD_PRED rather than FADD. Not sure if there's a huge benefit to this but given you're trying to produce something more isel friendly, having the combine as close to isel as possible is perhaps beneficial. I guess it just depends on if the extra predicate used by FADD_PRED makes the combine awkward/ugly.

Added constraints for scalable vector types and one setcc use only

@paulwalker-arm I replaced ISD::FMUL etc with AArch64::FMUL_PRED however it failed to do the combine afterwards

Harbormaster completed remote builds in B148962: Diff 407848.Feb 11 2022, 5:43 AM

peterwaller-arm added inline comments.Feb 14 2022, 8:31 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17155–17156	I don't think this pattern should be sensitive to the contents of the setcc, so we don't need to be looking for a `0` nor an `0`.
17168–17169	Per above, I don't think this should read operands of the setcc, except for inverting the condcode at the end. The combine condition wants to be something which matches when `op` appears on the right, and `op`'s left operand is equal to the left operand of the vselect. This naturally prevents an infinite loop because it's not possible for the `VSelect.LHS == VSelect.RHS.LHS` to be true before and after the swap; and it's not true of `VSelect.LHS == VSelect.RHS`. (For these things to be true there would have to be a cycle of values, which is not allowed in the IR DAG). I might find this a bit easier to read with LHS/RHS naming convention, since the vselect has an op0 which is the condcode, so its `Op1` is the LHS of the vselect, whereas the `OpOp0` is the LHS for the `op`. So my suggestions for some clearer naming, if you need those things: `NOp1` => `SelectA` `NOp2` => `SelectB` `OpOp0` => `OpLHS` Then the condition to perform the combine is `SelectA == OpLHS`, if `SelectB.Opcode` could profit from the transformation.
llvm/test/CodeGen/AArch64/sve-select.ll
654	Extraneous?

Removed SetCCOp0 == NOp2 and SetCCOp0 != NOp1 exit conditions
Added SelectA != SelectB.getOperand(0) exit condition

Harbormaster completed remote builds in B149697: Diff 408869.Feb 15 2022, 8:29 AM

Functionally I think it's looking reasonable to me. A few more stylistic nits.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17155–17156	Just checking if you saw the suggestion above this comment, which adds rationale and makes the select pattern comment a little easier to read.
17157	SDValue has Optional-like semantics built in: an `SDValue()` evaluates to false, so the optional is unnecessary here (I didn't see any other cases of Optional being used as a return argument like this in this file).
17164–17165	This if statement has a mix of conditions, referring to different things. It would be slightly better if it were grouped so that the setcc ones are next to each other at least. Better still, I might hoist the scalable query up to the top: auto NTy = N->getValueType(0); if (!NTy.isScalableVector()) return None; My concern is that the condition is hiding in there, someone scanning their eyes vertically at the condition might see 'setcc, setcc' on adjacent lines, and think that all of the conditions relate to setcc, where they do not.
17182	`SetCCOp0` is named here but the `SetCC.getOperand(1)` is not assigned a variable, so I'd drop the variable in this case because it is both single use and doesn't add any extra information.
17197–17198	Name clarity: inverting the `vselect` sounds like `(not vselect)`. The object being inverted is the setcc condition code. Suggestion: A better name might be `trySwapVSelectOperands`?

MattDevereau updated this revision to Diff 409181.Feb 16 2022, 2:32 AM

MattDevereau edited the summary of this revision. (Show Details)

Herald added a subscriber: ctetreau. · View Herald TranscriptFeb 16 2022, 2:32 AM

MattDevereau edited the summary of this revision. (Show Details)Feb 16 2022, 2:33 AM

MattDevereau edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B149912: Diff 409181.Feb 16 2022, 2:57 AM

peterwaller-arm accepted this revision.Feb 16 2022, 3:02 AM

peterwaller-arm added inline comments.

llvm/test/CodeGen/AArch64/sve-select.ll
546	Nit. This refers to attribute group #0 which is undefined.
640–641	Nit. Same again: This refers to attribute group #0 which is undefined.

This revision is now accepted and ready to land.Feb 16 2022, 3:02 AM

This revision was landed with ongoing or failed builds.Feb 17 2022, 8:01 AM

Closed by commit rG2f2dcb4fb134: [AArch64][SVE] Invert VSelect operand order and condition for predicated… (authored by MattDevereau). · Explain Why

This revision was automatically updated to reflect the committed changes.

MattDevereau added a commit: rG2f2dcb4fb134: [AArch64][SVE] Invert VSelect operand order and condition for predicated….

paulwalker-arm mentioned this in D121905: [AArch64][SVE] Fix lowering of "fcmp ueq/one" when using SVE.Mar 17 2022, 11:35 AM

Diff 409659

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,146 Lines • ▼ Show 20 Lines if (Invert) {

} }

SDLoc DL(N); SDLoc DL(N);

return DAG.getNode(NewOpc, DL, MVT::Other, N->getOperand(0), NewTestSrc, return DAG.getNode(NewOpc, DL, MVT::Other, N->getOperand(0), NewTestSrc,

DAG.getConstant(Bit, DL, MVT::i64), N->getOperand(3)); DAG.getConstant(Bit, DL, MVT::i64), N->getOperand(3));

} }

// Swap vselect operands where it may allow a predicated operation to achieve

// the `sel`.

peterwaller-armUnsubmitted

Not Done

DAG.getConstant(Bit, DL, MVT::i64), N->getOperand(3));

}

- // (vselect (setcc (a) (0)) (a) (op (a) (b)))

- // => (vselect (not setcc (a) (0)) (op (a) (b)) (a))

+ // Swap vselect operands where it may allow a predicated operation to achieve the `sel`.

+ //

+ // (vselect (setcc ( condcode) (_) (_)) (a) (op (a) (b)))

+ // => (vselect (setcc (!condcode) (_) (_)) (op (a) (b)) (a))

static Optional<SDValue> tryInvertVSelectWithSetCC(SDNode *N,

I don't think this pattern should be sensitive to the contents of the setcc, so we don't need to be looking for a 0 nor an 0.

peterwaller-arm: I don't think this pattern should be sensitive to the contents of the setcc, so we don't need…

peterwaller-armUnsubmitted

Not Done

Just checking if you saw the suggestion above this comment, which adds rationale and makes the select pattern comment a little easier to read.

peterwaller-arm: Just checking if you saw the suggestion above this comment, which adds rationale and makes the…

bsmithUnsubmitted

Not Done

I believe this transform would loop given something like:

m = fmul a, b
p = setcc <cond> m, 0
vselect p, m, m

bsmith: I believe this transform would loop given something like: ``` m = fmul a, b p = setcc <cond> m…

MattDevereauAuthorUnsubmitted

Done

isnt vselect p, m, m a nop?

i've created a test for the example which doesn't loop

define <vscale x 4 x float> @fcmp_select_f32_double_op(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
; CHECK-LABEL: fcmp_select_f32_double_op:
; CHECK:       // %bb.0:
; CHECK-NEXT:    fmul z0.s, z0.s, z1.s
; CHECK-NEXT:    ret
  %m = fmul <vscale x 4 x float> %a, %b
  %fcmp = fcmp oeq <vscale x 4 x float> %m, zeroinitializer
  %sel = select <vscale x 4 x i1> %fcmp, <vscale x 4 x float> %m, <vscale x 4 x float> %m
  ret <vscale x 4 x float> %sel
}

MattDevereau: isnt `vselect p, m, m` a nop? i've created a test for the example which doesn't loop…

bsmithUnsubmitted

Done

It likely will get removed as redundant yes, I just worry about things like this that could end up getting through in esoteric cases.

bsmith: It likely will get removed as redundant yes, I just worry about things like this that could end…

MattDevereauAuthorUnsubmitted

Done

added test fcmp_select_f32_double_op to llvm/test/CodeGen/AArch64/sve-select.ll and added condition if (SetCCOp0 == NOp2) return None; after condition SetCCOp0 != NOp1

MattDevereau: added test `fcmp_select_f32_double_op` to `llvm/test/CodeGen/AArch64/sve-select.ll` and added…

peterwaller-armUnsubmitted

Not Done

SDValue has Optional-like semantics built in: an SDValue() evaluates to false, so the optional is unnecessary here (I didn't see any other cases of Optional being used as a return argument like this in this file).

peterwaller-arm: SDValue has Optional-like semantics built in: an `SDValue()` evaluates to false, so the…

// (vselect (setcc ( condcode) (_) (_)) (a) (op (a) (b)))

// => (vselect (setcc (!condcode) (_) (_)) (op (a) (b)) (a))

static SDValue trySwapVSelectOperands(SDNode *N, SelectionDAG &DAG) {

bsmithUnsubmitted

Done

The comment above describing this transform isn't accurate as it doesn't reflect these restrictions around setcc.

bsmith: The comment above describing this transform isn't accurate as it doesn't reflect these…

MattDevereauAuthorUnsubmitted

Done

updated comment to
(vselect (setcc (a) (0)) (a) (op (a) (b)))
=> (vselect (not setcc (a) (0)) (op (a) (b)) (a))

MattDevereau: updated comment to // (vselect (setcc (a) (0)) (a) (op (a) (b))) // => (vselect (not setcc (a)…

auto SelectA = N->getOperand(1);

auto SelectB = N->getOperand(2);

auto NTy = N->getValueType(0);

if (!NTy.isScalableVector())

peterwaller-armUnsubmitted

Not Done

This if statement has a mix of conditions, referring to different things. It would be slightly better if it were grouped so that the setcc ones are next to each other at least. Better still, I might hoist the scalable query up to the top:

auto NTy = N->getValueType(0);
if (!NTy.isScalableVector())
  return None;

My concern is that the condition is hiding in there, someone scanning their eyes vertically at the condition might see 'setcc, setcc' on adjacent lines, and think that all of the conditions relate to setcc, where they do not.

peterwaller-arm: This if statement has a mix of conditions, referring to different things. It would be slightly…

return SDValue();

SDValue SetCC = N->getOperand(0);

if (SetCC.getOpcode() != ISD::SETCC || !SetCC.hasOneUse())

return SDValue();

peterwaller-armUnsubmitted

Not Done

Per above, I don't think this should read operands of the setcc, except for inverting the condcode at the end.

The combine condition wants to be something which matches when op appears on the right, and op's left operand is equal to the left operand of the vselect. This naturally prevents an infinite loop because it's not possible for the VSelect.LHS == VSelect.RHS.LHS to be true before and after the swap; and it's not true of VSelect.LHS == VSelect.RHS. (For these things to be true there would have to be a cycle of values, which is not allowed in the IR DAG).

I might find this a bit easier to read with LHS/RHS naming convention, since the vselect has an op0 which is the condcode, so its Op1 is the LHS of the vselect, whereas the OpOp0 is the LHS for the op.

So my suggestions for some clearer naming, if you need those things:
NOp1 => SelectA
NOp2 => SelectB
OpOp0 => OpLHS

Then the condition to perform the combine is SelectA == OpLHS, if SelectB.Opcode could profit from the transformation.

peterwaller-arm: Per above, I don't think this should read operands of the setcc, except for inverting the…

switch (SelectB.getOpcode()) {

default:

return SDValue();

case ISD::FMUL:

case ISD::FSUB:

case ISD::FADD:

break;

}

if (SelectA != SelectB.getOperand(0))

return SDValue();

ISD::CondCode CC = cast<CondCodeSDNode>(SetCC->getOperand(2))->get();

peterwaller-armUnsubmitted

Not Done

SetCCOp0 is named here but the SetCC.getOperand(1) is not assigned a variable, so I'd drop the variable in this case because it is both single use and doesn't add any extra information.

peterwaller-arm: `SetCCOp0` is named here but the `SetCC.getOperand(1)` is not assigned a variable, so I'd drop…

auto InverseSetCC = DAG.getSetCC(

SDLoc(SetCC), SetCC.getValueType(), SetCC.getOperand(0),

SetCC.getOperand(1), ISD::getSetCCInverse(CC, SetCC.getValueType()));

return DAG.getNode(ISD::VSELECT, SDLoc(N), NTy,

{InverseSetCC, SelectB, SelectA});

}

// vselect (v1i1 setcc) -> // vselect (v1i1 setcc) ->

// vselect (v1iXX setcc) (XX is the size of the compared operand type) // vselect (v1iXX setcc) (XX is the size of the compared operand type)

// FIXME: Currently the type legalizer can't handle VSELECT having v1i1 as // FIXME: Currently the type legalizer can't handle VSELECT having v1i1 as

// condition. If it can legalize "VSELECT v1i1" correctly, no need to combine // condition. If it can legalize "VSELECT v1i1" correctly, no need to combine

// such VSELECT. // such VSELECT.

static SDValue performVSelectCombine(SDNode *N, SelectionDAG &DAG) { static SDValue performVSelectCombine(SDNode *N, SelectionDAG &DAG) {

if (auto SwapResult = trySwapVSelectOperands(N, DAG))

return SwapResult;

peterwaller-armUnsubmitted

Not Done

static SDValue performVSelectCombine(SDNode *N, SelectionDAG &DAG) {

- if (auto InvertResult = tryInvertVSelectWithSetCC(N, DAG))

- return InvertResult.getValue();

+ if (auto SwapResult = trySwapVSelectOperands(N, DAG))

+ return SwapResult;

SDValue N0 = N->getOperand(0);

Name clarity: inverting the vselect sounds like (not vselect). The object being inverted is the setcc condition code. Suggestion: A better name might be trySwapVSelectOperands?

peterwaller-arm: Name clarity: inverting the `vselect` sounds like `(not vselect)`. The object being inverted is…

SDValue N0 = N->getOperand(0); SDValue N0 = N->getOperand(0);

EVT CCVT = N0.getValueType(); EVT CCVT = N0.getValueType();

if (isAllActivePredicate(DAG, N0)) if (isAllActivePredicate(DAG, N0))

return N->getOperand(1); return N->getOperand(1);

if (isAllInactivePredicate(N0)) if (isAllInactivePredicate(N0))

return N->getOperand(2); return N->getOperand(2);

▲ Show 20 Lines • Show All 3,114 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fp-reciprocal.ll

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	}			}

	define <vscale x 8 x half> @fsqrt_recip_8f16(<vscale x 8 x half> %a) #0 {			define <vscale x 8 x half> @fsqrt_recip_8f16(<vscale x 8 x half> %a) #0 {
	; CHECK-LABEL: fsqrt_recip_8f16:			; CHECK-LABEL: fsqrt_recip_8f16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: frsqrte z1.h, z0.h			; CHECK-NEXT: frsqrte z1.h, z0.h
	; CHECK-NEXT: ptrue p0.h			; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: fmul z2.h, z1.h, z1.h			; CHECK-NEXT: fmul z2.h, z1.h, z1.h
	; CHECK-NEXT: fcmeq p0.h, p0/z, z0.h, #0.0			; CHECK-NEXT: fcmne p0.h, p0/z, z0.h, #0.0
	; CHECK-NEXT: frsqrts z2.h, z0.h, z2.h			; CHECK-NEXT: frsqrts z2.h, z0.h, z2.h
	; CHECK-NEXT: fmul z1.h, z1.h, z2.h			; CHECK-NEXT: fmul z1.h, z1.h, z2.h
	; CHECK-NEXT: fmul z2.h, z1.h, z1.h			; CHECK-NEXT: fmul z2.h, z1.h, z1.h
	; CHECK-NEXT: frsqrts z2.h, z0.h, z2.h			; CHECK-NEXT: frsqrts z2.h, z0.h, z2.h
	; CHECK-NEXT: fmul z1.h, z1.h, z2.h			; CHECK-NEXT: fmul z1.h, z1.h, z2.h
	; CHECK-NEXT: fmul z1.h, z0.h, z1.h			; CHECK-NEXT: fmul z0.h, p0/m, z0.h, z1.h
	; CHECK-NEXT: sel z0.h, p0, z0.h, z1.h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%fsqrt = call fast <vscale x 8 x half> @llvm.sqrt.nxv8f16(<vscale x 8 x half> %a)			%fsqrt = call fast <vscale x 8 x half> @llvm.sqrt.nxv8f16(<vscale x 8 x half> %a)
	ret <vscale x 8 x half> %fsqrt			ret <vscale x 8 x half> %fsqrt
	}			}

	define <vscale x 4 x float> @fsqrt_4f32(<vscale x 4 x float> %a) {			define <vscale x 4 x float> @fsqrt_4f32(<vscale x 4 x float> %a) {
	; CHECK-LABEL: fsqrt_4f32:			; CHECK-LABEL: fsqrt_4f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: fsqrt z0.s, p0/m, z0.s			; CHECK-NEXT: fsqrt z0.s, p0/m, z0.s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%fsqrt = call fast <vscale x 4 x float> @llvm.sqrt.nxv4f32(<vscale x 4 x float> %a)			%fsqrt = call fast <vscale x 4 x float> @llvm.sqrt.nxv4f32(<vscale x 4 x float> %a)
	ret <vscale x 4 x float> %fsqrt			ret <vscale x 4 x float> %fsqrt
	}			}

	define <vscale x 4 x float> @fsqrt_recip_4f32(<vscale x 4 x float> %a) #0 {			define <vscale x 4 x float> @fsqrt_recip_4f32(<vscale x 4 x float> %a) #0 {
	; CHECK-LABEL: fsqrt_recip_4f32:			; CHECK-LABEL: fsqrt_recip_4f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: frsqrte z1.s, z0.s			; CHECK-NEXT: frsqrte z1.s, z0.s
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: fmul z2.s, z1.s, z1.s			; CHECK-NEXT: fmul z2.s, z1.s, z1.s
	; CHECK-NEXT: fcmeq p0.s, p0/z, z0.s, #0.0			; CHECK-NEXT: fcmne p0.s, p0/z, z0.s, #0.0
	; CHECK-NEXT: frsqrts z2.s, z0.s, z2.s			; CHECK-NEXT: frsqrts z2.s, z0.s, z2.s
	; CHECK-NEXT: fmul z1.s, z1.s, z2.s			; CHECK-NEXT: fmul z1.s, z1.s, z2.s
	; CHECK-NEXT: fmul z2.s, z1.s, z1.s			; CHECK-NEXT: fmul z2.s, z1.s, z1.s
	; CHECK-NEXT: frsqrts z2.s, z0.s, z2.s			; CHECK-NEXT: frsqrts z2.s, z0.s, z2.s
	; CHECK-NEXT: fmul z1.s, z1.s, z2.s			; CHECK-NEXT: fmul z1.s, z1.s, z2.s
	; CHECK-NEXT: fmul z1.s, z0.s, z1.s			; CHECK-NEXT: fmul z0.s, p0/m, z0.s, z1.s
	; CHECK-NEXT: sel z0.s, p0, z0.s, z1.s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%fsqrt = call fast <vscale x 4 x float> @llvm.sqrt.nxv4f32(<vscale x 4 x float> %a)			%fsqrt = call fast <vscale x 4 x float> @llvm.sqrt.nxv4f32(<vscale x 4 x float> %a)
	ret <vscale x 4 x float> %fsqrt			ret <vscale x 4 x float> %fsqrt
	}			}

	define <vscale x 2 x double> @fsqrt_2f64(<vscale x 2 x double> %a) {			define <vscale x 2 x double> @fsqrt_2f64(<vscale x 2 x double> %a) {
	; CHECK-LABEL: fsqrt_2f64:			; CHECK-LABEL: fsqrt_2f64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.d			; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: fsqrt z0.d, p0/m, z0.d			; CHECK-NEXT: fsqrt z0.d, p0/m, z0.d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%fsqrt = call fast <vscale x 2 x double> @llvm.sqrt.nxv2f64(<vscale x 2 x double> %a)			%fsqrt = call fast <vscale x 2 x double> @llvm.sqrt.nxv2f64(<vscale x 2 x double> %a)
	ret <vscale x 2 x double> %fsqrt			ret <vscale x 2 x double> %fsqrt
	}			}

	define <vscale x 2 x double> @fsqrt_recip_2f64(<vscale x 2 x double> %a) #0 {			define <vscale x 2 x double> @fsqrt_recip_2f64(<vscale x 2 x double> %a) #0 {
	; CHECK-LABEL: fsqrt_recip_2f64:			; CHECK-LABEL: fsqrt_recip_2f64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: frsqrte z1.d, z0.d			; CHECK-NEXT: frsqrte z1.d, z0.d
	; CHECK-NEXT: ptrue p0.d			; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: fmul z2.d, z1.d, z1.d			; CHECK-NEXT: fmul z2.d, z1.d, z1.d
	; CHECK-NEXT: fcmeq p0.d, p0/z, z0.d, #0.0			; CHECK-NEXT: fcmne p0.d, p0/z, z0.d, #0.0
	; CHECK-NEXT: frsqrts z2.d, z0.d, z2.d			; CHECK-NEXT: frsqrts z2.d, z0.d, z2.d
	; CHECK-NEXT: fmul z1.d, z1.d, z2.d			; CHECK-NEXT: fmul z1.d, z1.d, z2.d
	; CHECK-NEXT: fmul z2.d, z1.d, z1.d			; CHECK-NEXT: fmul z2.d, z1.d, z1.d
	; CHECK-NEXT: frsqrts z2.d, z0.d, z2.d			; CHECK-NEXT: frsqrts z2.d, z0.d, z2.d
	; CHECK-NEXT: fmul z1.d, z1.d, z2.d			; CHECK-NEXT: fmul z1.d, z1.d, z2.d
	; CHECK-NEXT: fmul z2.d, z1.d, z1.d			; CHECK-NEXT: fmul z2.d, z1.d, z1.d
	; CHECK-NEXT: frsqrts z2.d, z0.d, z2.d			; CHECK-NEXT: frsqrts z2.d, z0.d, z2.d
	; CHECK-NEXT: fmul z1.d, z1.d, z2.d			; CHECK-NEXT: fmul z1.d, z1.d, z2.d
	; CHECK-NEXT: fmul z1.d, z0.d, z1.d			; CHECK-NEXT: fmul z0.d, p0/m, z0.d, z1.d
	; CHECK-NEXT: sel z0.d, p0, z0.d, z1.d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%fsqrt = call fast <vscale x 2 x double> @llvm.sqrt.nxv2f64(<vscale x 2 x double> %a)			%fsqrt = call fast <vscale x 2 x double> @llvm.sqrt.nxv2f64(<vscale x 2 x double> %a)
	ret <vscale x 2 x double> %fsqrt			ret <vscale x 2 x double> %fsqrt
	}			}

	declare <vscale x 2 x half> @llvm.sqrt.nxv2f16(<vscale x 2 x half>)			declare <vscale x 2 x half> @llvm.sqrt.nxv2f16(<vscale x 2 x half>)
	declare <vscale x 4 x half> @llvm.sqrt.nxv4f16(<vscale x 4 x half>)			declare <vscale x 4 x half> @llvm.sqrt.nxv4f16(<vscale x 4 x half>)
	declare <vscale x 8 x half> @llvm.sqrt.nxv8f16(<vscale x 8 x half>)			declare <vscale x 8 x half> @llvm.sqrt.nxv8f16(<vscale x 8 x half>)
	declare <vscale x 2 x float> @llvm.sqrt.nxv2f32(<vscale x 2 x float>)			declare <vscale x 2 x float> @llvm.sqrt.nxv2f32(<vscale x 2 x float>)
	declare <vscale x 4 x float> @llvm.sqrt.nxv4f32(<vscale x 4 x float>)			declare <vscale x 4 x float> @llvm.sqrt.nxv4f32(<vscale x 4 x float>)
	declare <vscale x 2 x double> @llvm.sqrt.nxv2f64(<vscale x 2 x double>)			declare <vscale x 2 x double> @llvm.sqrt.nxv2f64(<vscale x 2 x double>)

	attributes #0 = { "reciprocal-estimates"="all" }			attributes #0 = { "reciprocal-estimates"="all" }

llvm/test/CodeGen/AArch64/sve-select.ll

	Show First 20 Lines • Show All 536 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: sbfx x8, x8, #0, #1			; CHECK-NEXT: sbfx x8, x8, #0, #1
	; CHECK-NEXT: whilelo p2.b, xzr, x8			; CHECK-NEXT: whilelo p2.b, xzr, x8
	; CHECK-NEXT: sel p0.b, p2, p0.b, p1.b			; CHECK-NEXT: sel p0.b, p2, p0.b, p1.b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%mask = icmp eq i64 %x0, 0			%mask = icmp eq i64 %x0, 0
	%sel = select i1 %mask, <vscale x 16 x i1> %a, <vscale x 16 x i1> %b			%sel = select i1 %mask, <vscale x 16 x i1> %a, <vscale x 16 x i1> %b
	ret <vscale x 16 x i1> %sel			ret <vscale x 16 x i1> %sel
	}			}

				define <vscale x 4 x float> @select_f32_invert_fmul(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				peterwaller-armUnsubmitted Not Done Reply Inline Actions Nit. This refers to attribute group #0 which is undefined. peterwaller-arm: Nit. This refers to attribute group #0 which is undefined.
				; CHECK-LABEL: select_f32_invert_fmul:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fcmne p0.s, p0/z, z0.s, #0.0
				; CHECK-NEXT: fmul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%p = fcmp oeq <vscale x 4 x float> %a, zeroinitializer
				%fmul = fmul <vscale x 4 x float> %a, %b
				%sel = select <vscale x 4 x i1> %p, <vscale x 4 x float> %a, <vscale x 4 x float> %fmul
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 4 x float> @select_f32_invert_fadd(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: select_f32_invert_fadd:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fcmne p0.s, p0/z, z0.s, #0.0
				; CHECK-NEXT: fadd z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%p = fcmp oeq <vscale x 4 x float> %a, zeroinitializer
				%fadd = fadd <vscale x 4 x float> %a, %b
				%sel = select <vscale x 4 x i1> %p, <vscale x 4 x float> %a, <vscale x 4 x float> %fadd
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 4 x float> @select_f32_invert_fsub(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: select_f32_invert_fsub:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fcmne p0.s, p0/z, z0.s, #0.0
				; CHECK-NEXT: fsub z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%p = fcmp oeq <vscale x 4 x float> %a, zeroinitializer
				%fsub = fsub <vscale x 4 x float> %a, %b
				%sel = select <vscale x 4 x i1> %p, <vscale x 4 x float> %a, <vscale x 4 x float> %fsub
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 4 x float> @select_f32_no_invert_op_lhs(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: select_f32_no_invert_op_lhs:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fcmeq p0.s, p0/z, z0.s, #0.0
				; CHECK-NEXT: fmul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%p = fcmp oeq <vscale x 4 x float> %a, zeroinitializer
				%fmul = fmul <vscale x 4 x float> %a, %b
				%sel = select <vscale x 4 x i1> %p, <vscale x 4 x float> %fmul, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 4 x float> @select_f32_no_invert_2_op(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c, <vscale x 4 x float> %d) {
				; CHECK-LABEL: select_f32_no_invert_2_op:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fmul z2.s, z2.s, z3.s
				; CHECK-NEXT: fcmeq p0.s, p0/z, z0.s, #0.0
				; CHECK-NEXT: fmul z0.s, z0.s, z1.s
				; CHECK-NEXT: sel z0.s, p0, z0.s, z2.s
				; CHECK-NEXT: ret
				%p = fcmp oeq <vscale x 4 x float> %a, zeroinitializer
				%fmul1 = fmul <vscale x 4 x float> %a, %b
				%fmul2 = fmul <vscale x 4 x float> %c, %d
				%sel = select <vscale x 4 x i1> %p, <vscale x 4 x float> %fmul1, <vscale x 4 x float> %fmul2
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 4 x float> @select_f32_no_invert_equal_ops(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: select_f32_no_invert_equal_ops:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fmul z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%m = fmul <vscale x 4 x float> %a, %b
				%p = fcmp oeq <vscale x 4 x float> %m, zeroinitializer
				%sel = select <vscale x 4 x i1> %p, <vscale x 4 x float> %m, <vscale x 4 x float> %m
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 4 x float> @select_f32_no_invert_fmul_two_setcc_uses(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c, i32 %len) #0 {
				; CHECK-LABEL: select_f32_no_invert_fmul_two_setcc_uses:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fadd z1.s, z0.s, z1.s
				; CHECK-NEXT: fcmeq p0.s, p0/z, z0.s, #0.0
				; CHECK-NEXT: sel z0.s, p0, z0.s, z1.s
				; CHECK-NEXT: mov z0.s, p0/m, z2.s
				; CHECK-NEXT: ret
				%p = fcmp oeq <vscale x 4 x float> %a, zeroinitializer
				%fadd = fadd <vscale x 4 x float> %a, %b
				%sel = select <vscale x 4 x i1> %p, <vscale x 4 x float> %a, <vscale x 4 x float> %fadd
				%sel2 = select <vscale x 4 x i1> %p, <vscale x 4 x float> %c, <vscale x 4 x float> %sel
				ret <vscale x 4 x float> %sel2
				}

				define <4 x float> @select_f32_no_invert_not_scalable(<4 x float> %a, <4 x float> %b) #0 {
				peterwaller-armUnsubmitted Not Done Reply Inline Actions Nit. Same again: This refers to attribute group #0 which is undefined. peterwaller-arm: Nit. Same again: This refers to attribute group #0 which is undefined.
				; CHECK-LABEL: select_f32_no_invert_not_scalable:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fcmeq v2.4s, v0.4s, #0.0
				; CHECK-NEXT: fmul v1.4s, v0.4s, v1.4s
				; CHECK-NEXT: bif v0.16b, v1.16b, v2.16b
				; CHECK-NEXT: ret
				%p = fcmp oeq <4 x float> %a, zeroinitializer
				%fmul = fmul <4 x float> %a, %b
				%sel = select <4 x i1> %p, <4 x float> %a, <4 x float> %fmul
				ret <4 x float> %sel
				}
				peterwaller-armUnsubmitted Not Done Reply Inline Actions Extraneous? peterwaller-arm: Extraneous?

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Invert VSelect operand order and condition for predicated arithmetic operations
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 409659

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-fp-reciprocal.ll

llvm/test/CodeGen/AArch64/sve-select.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Invert VSelect operand order and condition for predicated arithmetic operationsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 409659

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-fp-reciprocal.ll

llvm/test/CodeGen/AArch64/sve-select.ll

[AArch64][SVE] Invert VSelect operand order and condition for predicated arithmetic operations
ClosedPublic