This is an archive of the discontinued LLVM Phabricator instance.

This patch doesn't seem like it is in a state to review yet (e.g. it doesn't have any tests), perhaps you can add 'WIP' to the title to make it clear that this is a Work In Progress?

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1420–1434 ↗	(On Diff #494676)	Just as a suggestion for future uses of these pattern matches, is that you could write this in a more compact way like this: if (match(Op1, m_SpecificInt(1)) && match(Op2, m_Add(m_Add(m_OneUse(m_Value(A)), m_OneUse(m_Value(B))), m_ConstantInt(C))) && (C->getZExtValue() == 0 \|\| C->getZExtValue() == 1)) { ... }
1456 ↗	(On Diff #494676)	This transform is only profitable when the target has no explicit instructions for this pattern (e.g. like Arm's urhadd/srhadd) and the extends are expensive. The InstCombine pass transforms the IR to a canonical form and doesn't use the CostModel to decide whether a certain form is profitable or not. A better place to do this would be in an DAGCombine (to start it's good enough for this to be target-specific, e.g. a combine in AArch64ISelLowering.cpp) where we can simply check if the target being compiled for has the urhadd/shradd instructions, and if not, do the transform.

This revision now requires changes to proceed.Feb 6 2023, 12:58 AM

hassnaa-arm retitled this revision from [Transform][InstCombine]: transform lshr pattern. to [Transform][InstCombine]: transform lshr pattern. [WIP].Feb 6 2023, 2:01 AM

Matt added a subscriber: Matt.Feb 6 2023, 12:16 PM

Move combining trunc shift and extend shift to AArch64

hassnaa-arm retitled this revision from [Transform][InstCombine]: transform lshr pattern. [WIP] to [AArch64][combine]: transform lshr pattern. [WIP].Feb 7 2023, 10:28 AM

hassnaa-arm edited the summary of this revision. (Show Details)

Herald added a subscriber: kristof.beyls. · View Herald TranscriptFeb 7 2023, 10:28 AM

hassnaa-arm retitled this revision from [AArch64][combine]: transform lshr pattern. [WIP] to [AArch64][combine]: combine lshr pattern. [WIP].Feb 7 2023, 10:28 AM

hassnaa-arm added inline comments.Feb 7 2023, 10:36 AM

llvm/test/CodeGen/AArch64/neon-lshr.ll

22 ↗

(On Diff #495593)

I investigated the generated DAG and the effect of my changes,
I found that my changes affected the DAG.
Here is the DAG just before the step of instruction selection:

SelectionDAG has 19 nodes:
  t0: ch,glue = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t4: i32,ch = CopyFromReg t0, Register:i32 %1
          t50: i32 = and t2, Constant:i32<254>
        t37: i32 = srl t50, Constant:i64<1>
          t52: i32 = and t4, Constant:i32<254>
        t42: i32 = srl t52, Constant:i64<1>
      t43: i32 = add t37, t42
        t39: i32 = or t2, t4
      t40: i32 = and t39, Constant:i32<1>
    t44: i32 = add t43, t40
  t17: ch,glue = CopyToReg t0, Register:i32 $w0, t44
  t18: ch = AArch64ISD::RET_FLAG t17, Register:i32 $w0, t17:1

Harbormaster completed remote builds in B212439: Diff 495593.Feb 7 2023, 12:04 PM

sdesmalen added inline comments.Feb 8 2023, 3:00 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17699	As I understand it, the point to simplify the following: trunc(lshr(add(add(ext(a), ext(b)), 1), 1)) -> lshr(a, 1) + lshr(b, 1) + (a \| b) & 1 iff the type of ext(a) is not a legal type
17701	I see that in llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp it tries to combine this pattern into a simpler `AVGFLOORS/AVGFLOORU/AVGCEILS/AVGCEILU` node in the function `combineShiftToAVG`. Is there a way to re-use this existing mechanism and then Custom lower this node to the desired set of instructions? I even wonder if we could generalise this code (i.e. not specific to AArch64), when the nodes are set to Expand.
17701	You're testing for SVE2 (vector extension), but the test is using scalar types, not vector types. That doesn't seem entirely right?
18677	I don't believe this is a transform we want to make.

Add AArch64 implementation for custom-lowering AVGFloor/AVGCeil

hassnaa-arm retitled this revision from [AArch64][combine]: combine lshr pattern. [WIP] to [AArch64][SVE]: custom lower AVGFloor/AVGCeil. [WIP].Feb 10 2023, 4:02 AM

hassnaa-arm edited the summary of this revision. (Show Details)

Herald added a reviewer: efriedma. · View Herald TranscriptFeb 10 2023, 4:02 AM

Herald added subscribers: psnobl, tschuett. · View Herald Transcript

Remove neon-lshr.ll

Harbormaster completed remote builds in B213012: Diff 496419.Feb 10 2023, 4:59 AM

Remove old code that is not used now.

Please pre-commit the new testcases, so the changes are more visible.

The <vscale x 2 x i128> variations currently crash the compiler; I'd recommend fixing that before trying to optimize them.

The "expected" code for haddu_v2i32 is worse than what we currently generate.

Add precursory patch.

Harbormaster completed remote builds in B214097: Diff 497932.Feb 16 2023, 1:57 AM

Optimize the generated code by checking if the extended node was previously truncated.

Herald added a subscriber: dmgreen. · View Herald TranscriptFeb 16 2023, 9:02 AM

Harbormaster completed remote builds in B214175: Diff 498046.Feb 16 2023, 9:03 AM

Optimize the generated code by checking if the extended nodes were previously truncated.

Harbormaster completed remote builds in B214180: Diff 498050.Feb 16 2023, 9:26 AM

hassnaa-arm retitled this revision from [AArch64][SVE]: custom lower AVGFloor/AVGCeil. [WIP] to [AArch64][SVE]: custom lower AVGFloor/AVGCeil..Feb 20 2023, 2:04 AM

sdesmalen added inline comments.Feb 20 2023, 3:41 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1034 ↗	(On Diff #498046)	This is probably better added as a separate DAGCombine that folds away the sign-extend-in-regs explicitly, because that is already performed by the avgfloor operation. avgfloors(sextinreg(x), sextinreg(y)) -> avgfloors(x, y)
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13404	You seem to be trying to optimise the case where one of the operands is a vector of ones, but I don't see any motivating tests for this? I would also expect these cases to be folded away already by existing DAGCombines.
llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
64	This doesn't look like an improvement, we don't really want to do this transform if it makes the resulting code worse. Do you know why this results in worse code?

hassnaa-arm added inline comments.Feb 20 2023, 4:37 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1034 ↗	(On Diff #498046)	Sorry, I don't understand this comment, may you clarify if more ? Why did you mention sign-extend-in-regs ? why is it related to this code ?

sdesmalen added inline comments.Feb 20 2023, 4:52 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1034 ↗	(On Diff #498046)	The truncate + sign-extend will be combined into a sign-extend-in-reg. You can see that if you disable the code you added below on lines 1046-1055, and run the test `@hadd8_sext_asr`

hassnaa-arm added inline comments.Feb 20 2023, 5:11 AM

llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
64	In the new changes, the generated instructions are exactly the equivalent for AVGFloor, no additional instructions. I think nothing can be done for this case.

sdesmalen added inline comments.Feb 20 2023, 8:45 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1034 ↗	(On Diff #498046)	Given my other suggestion, please ignore my suggestion above to add a DAGCombine (which I'm not even sure is legal; the semantics of the AVGFLOOR operation aren't entirely clear to me still)
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13397	Should this be using an arithmetic shift when the operation is signed?
llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
64	For these cases (where the input is an unpacked vector that is promoted using an `and` (for the unsigned case) or `sign_extend_inreg` (for the signed case)) we can emit the original code when Custom lowering these nodes. That way we can mitigate the regression.
161	Should this use `ashr` (arithmetic shift, since the values s0s/s1s are signed)? Same question for other tests in this file.

Check if it's better to emit the original code or custom lower AVGFloor/Ceil

Harbormaster completed remote builds in B214987: Diff 499126.Feb 21 2023, 5:33 AM

Change lshr to ashr for signed cases in the precursory patch.

Harbormaster completed remote builds in B214995: Diff 499135.Feb 21 2023, 5:53 AM

In case of CEIL, Put ADD operation for constant 1.

Harbormaster completed remote builds in B215306: Diff 499568.Feb 22 2023, 10:06 AM

sdesmalen added inline comments.Feb 23 2023, 1:57 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13402	This is missing a check to ensure that the `and` value is a mask, and that the masked 'type' is smaller than the size of the current type. Perhaps you can get some inspiration from `isExtendOrShiftOperand` which checks whether the operation is zero or sign-extended. In either case, it would be nice to have this logic in separate `isSignedExtended()` and `isZeroExtended()` functions.
13405	There are several uses of Node->getOpcode(), you can move that to a separate variable. It would also be nice to have some variables: IsSigned (for AVGFLOORS and AVGCEILS) IsFloor (for AVGFLOORS and AVGFLOORU) ShiftOpc = IsSigned ? ISD::SRA : ISD::SRL; That way you can simplify the code a bit, without having to check the opcodes everywhere, and you can combine some of the if/else branches.
llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
1	The filename has both a dash (`-`) and underscores (`_`). Can you rename it to sve-avg-floor-ceil.ll ?
2	Can you add two RUN lines, one for -mattr=+sve and one for -mattr=+sve2 ?
4	Can you add these tests to the precursory patch, so that this patch only shows the diff?

hassnaa-arm marked 4 inline comments as done.Feb 23 2023, 8:41 AM

hassnaa-arm added inline comments.

llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
4	Adding them cause a crash. Crash message is: "LLVM ERROR: Scalarization of scalable vectors is not supported." It crashes while legalising the result type of nxv2i128

Enhance code readability.

Hello. Sorry for the delay in looking at this but I wasn't sure exactly what you were trying to do, and I've never been a huge fan of DAG combines that create the wrong node just to expand it later. It looks like for legal types this can lead to a nice decrease in instruction count though.

For smaller types I'm not sure that checking for individual opcodes for extension will work well. They could already be extending loads for example. I've not thought about it too much yet, but As far as I understand from the original hadd (/avg) work it would probably be better to be checking that the top bits are known 0 for unsigned, and that there are >1 sign bits for the signed cases.

In any case, it would be good to see alive proofs for the transforms you are making.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6090–6092	This can be moved into the start of LowerAVG to make this function a little more regular. (I know it's not super regular already, but other parts already do the same thing).
13391	Can it use countTrailingOnes as opposed to countPopulation? Although I think these functions might be better removed and use computeKnownBits checks instead.
13405	This can just be called LowerAVG
13427	check should be Check
13428	Why is it OK to only check one of the operands?
13437	Doesn't need the else after the return, which might help simplify if this is checking the sign bits.
llvm/test/CodeGen/AArch64/sve-avg-floor-ceil.ll
3 ↗	(On Diff #499878)	There is already a test for hadds in sve in sve2-hadd. This test looks like a copy of it. It would probably be better to just use the existing test with an extra run line for SVE1 targets.
llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
4	Like Eli said it would be good to fix vscale x 1 vectorization so this wasn't a problem any more.

Remove sve-avgfloor testing file.
Add RUN line for sve to sve2-hadd
rename sve2-hadd to sve-hadd

hassnaa-arm added inline comments.Feb 28 2023, 4:53 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13391	I think using countTrailingOnes is much simpler than computeKnownBits because the operand and its value are already known.

sdesmalen added inline comments.Feb 28 2023, 5:46 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13391	I presume that @dmgreen meant using `computeKnownBits` on `Node` as a whole, not specifically on the splatted value so that you don't need to check for the opcode explicitly. See for example `checkZExtBool`, where it does: APInt RequredZero(SizeInBits, 0xFE); KnownBits Bits = DAG.computeKnownBits(Arg, 4); bool ZExtBool = (Bits.Zero & RequredZero) == RequredZero; You can probably do a similar thing for `IsSignExtended`, but then looking at the `Bits.One` instead of `Bits.Zero`.

Harbormaster completed remote builds in B216441: Diff 501093.Feb 28 2023, 6:01 AM

Use computeKnownBits for checking zeroExtedn/signExtend.

Harbormaster completed remote builds in B216498: Diff 501178.Feb 28 2023, 10:40 AM

Use ComputeNumSignBits instead of ComputeKnownBits for SIGN_EXTEND_INREG ops.

hassnaa-arm added inline comments.Mar 1 2023, 7:42 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13402	I used DAG.ComputeNumSignBits instead of computeKnownBits, because computeKnownBits doesn't get any known bits for SIGN_EXTEND_INREG
llvm/test/CodeGen/AArch64/sve-hadd.ll
104 ↗	(On Diff #501511)	@dmgreen I tried to get alive proofs for this transform, https://alive2.llvm.org/ce/ but it seems there is something wrong. I'm not sure if the problem related to my equivalent IR to the generated code or the problem is because the transform is not correct.

Hi - I was just looking at the patch whilst you updated it! Please ignore any comments that don't apply any more.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13387	It might be better to make these lambdas inside LowerAVG. The names sound fairly generic but they are in practice tied to hadd nodes.
13388	Node.getValueType()...
13390	I believe this only needs 1 bit to be zero, so it could probably use Known.Zero.isSignBitSet()? There is a proof in https://alive2.llvm.org/ce/z/cEtdJa for rhadd.
13397	I think this one would be better to use computeNumSignBits. It's less important whether they are 0 or 1, just that they are the same as the top bit. Again I think it needs 1 bit to be valid, so computeNumSignBits(..) > 1 should do it. https://alive2.llvm.org/ce/z/VbxRs-. It would be good to see proofs for the other bit here too where the hadd is expanded to `SRL(A) + SRL(B) + (A&B)&1`
13415–13421	I would probably do: unsigned Opc = Op->getOpcode(); bool IsCeil = Opc == ISD::AVGCEILS \|\| Opc == ISD::AVGCEILU; bool IsSigned = Opc == ISD::AVGFLOORS \|\| Op->getOpcode() == ISD::AVGCEILS;
13428	&&, not \|\|, I would expect. A HADD is equivalent to `trunc(shr(add(ext(x),ext(y)), 1))`, it is not directly equivalent to `shr(add(x,y), 1)`. So we need to prove that turning it into the simpler version add+shift is better. It's not really "emit the original code".
llvm/test/CodeGen/AArch64/sve-hadd.ll
75 ↗	(On Diff #501511)	This ideally shouldn't change. Because the top bit isn't demanded instcombine will transform ashr into lshr, and we should be testing what the backend will see: https://godbolt.org/z/bvof8j7cc. I guess the lshr version isn't transformed after it gets promoted? It might be OK in this case.
104 ↗	(On Diff #501511)	Can you update the link? There are some other alive links that could help.

Harbormaster completed remote builds in B216718: Diff 501511.Mar 1 2023, 8:54 AM

hassnaa-arm marked 5 inline comments as done.Mar 1 2023, 12:13 PM

hassnaa-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13390	Is that because the sign bit will be known to be zero ONLY for zero-extend case ?
llvm/test/CodeGen/AArch64/sve-hadd.ll
75 ↗	(On Diff #501511)	But for some cases, using lshr in the IR cause generating extra AND instruction.

Check if both operands of AVG are extended, not just single one.

Harbormaster completed remote builds in B216806: Diff 501628.Mar 1 2023, 1:58 PM

dmgreen added inline comments.Mar 2 2023, 12:21 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13390	Oh right. I was assuming that it would use `if (!IsSigned && IsZeroExtended(OpA) && IsZeroExtended(OpB))` and `if (IsSigned && IsSignExtended(OpA) && IsSignExtended(OpB))` An hadd / rhadd is defined as converting into a arbitrary wide integer before doing the add/shift and then converting back. It just happens that it only needs 1 bit extra for that to be equivalent to any other type sizes. So we only need to check the top bit is known to be 0. (isSignBitSet doesnt really have anything to do with signedness here, its just a way of checking the top bit).

hassnaa-arm marked an inline comment as done.Mar 2 2023, 6:59 AM

While checking isZeroExtending, only checking the signbit of known Zeros is enough.

sdesmalen added inline comments.Mar 2 2023, 7:34 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13396	Move this closer to use.
13399	nit: unnecessary newline. Also, these lines exceed the 80char limit, so please run clang-format on this code.
13412	You can remove this condition, it's covered by DAG.ComputeNumSignBits.
13418–13434	You can combine these cases doing something like this: if ((IsSigned && IsSignExtended(OpA) && IsSignExtended(OpB)) \|\| (!IsSigned && IsZeroExtended(OpA) && IsZeroExtended(OpB))) { ... unsigned ShiftOpc = IsSigned ? ISD::SRA : ISD::SRL; return DAG.getNode(ShiftOpc, dl, VT, Add, ConstantOne); }
13440–13444	nit: SDValue Tmp = DAG.getNode(IsCeil ? ISD::OR : ISD::AND, dl, VT, OpA, OpB);

Harbormaster completed remote builds in B216971: Diff 501857.Mar 2 2023, 8:05 AM

Enhance code readability.

Harbormaster completed remote builds in B217139: Diff 502090.Mar 3 2023, 4:29 AM

Thanks. here are some alive proofs for the transform in https://alive2.llvm.org/ce/z/N6hwQY and https://alive2.llvm.org/ce/z/u_GjYJ.

Can you extend the testing to include both ashr and lshr versions? They should both be useful if we are custom legalizing the nodes. Otherwise I think this looks good.

llvm/test/CodeGen/AArch64/sve-hadd.ll
25 ↗	(On Diff #502090)	Can you copy these tests so there are versions with both lshr and ashr.

Add test cases for logical shr.

hassnaa-arm added inline comments.Mar 7 2023, 2:17 AM

llvm/test/CodeGen/AArch64/sve-hadd.ll
904 ↗	(On Diff #502962)	here is a transformation proof for this case: https://alive2.llvm.org/ce/z/vPJi6R @dmgreen

I think it's worth adding test for both the ashr and lshr versions, but otherwise I think this LGTM. Thanks

llvm/test/CodeGen/AArch64/sve-hadd.ll
166 ↗	(On Diff #502962)	Is there a lshr version of this one? It would be good to have some that are "full width" and use lshr. As instcombine will convert all the ashr to lshr it might be best to make sure there are tests for all the functions that were changed.
904 ↗	(On Diff #502962)	I think that's another case. This one would be https://alive2.llvm.org/ce/z/tp5NmX. From what I can tell they all look OK, according to alive.

Harbormaster completed remote builds in B217820: Diff 502962.Mar 7 2023, 3:00 AM

Thanks for all the changes @hassnaa-arm, I've just left some final minor comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

13387

Can you add a comment explaining what this does, e.g. something like

When x and y are extended, lower:
  avgfloor(x, y) -> (x + y) >> 1
  avgceil(x, y)  -> (x + y + 1) >> 1

Otherwise, lower to:
  avgfloor(x, y) -> (x >> 1) + (y >> 1) + (x && y && 1)
  avgceil(x, y)  -> (x >> 1) + (y >> 1) + ((x || y) && 1)

13416–13417

nit: this comment would be redundant if you address my other comment (to add a comment for the function itself)

13424–13426

nit:

SDValue ShiftOpA = DAG.getNode(ShiftOpc, dl, VT, OpA, ConstantOne);
SDValue ShiftOpB = DAG.getNode(ShiftOpc, dl, VT, OpB, ConstantOne);

hassnaa-arm marked 3 inline comments as done.Mar 7 2023, 4:42 AM

hassnaa-arm added inline comments.

llvm/test/CodeGen/AArch64/sve-hadd.ll
166 ↗	(On Diff #502962)	Sorry, I don't understand what you mean by "full width" ?

Add comments explaining what LowerAvg() does.

dmgreen added inline comments.Mar 7 2023, 5:10 AM

llvm/test/CodeGen/AArch64/sve-hadd.ll
166 ↗	(On Diff #502962)	I just meant a multiple of 128 - a <vscale x 4 x i32>. There appear to be `<vscale x 2 x ..>` tests for lshr, but we should have all the others sizes too.

Harbormaster completed remote builds in B217843: Diff 502991.Mar 7 2023, 5:47 AM

hassnaa-arm marked an inline comment as done.Mar 8 2023, 1:38 PM

Add test cases that use lshr.

Harbormaster completed remote builds in B218197: Diff 503500.Mar 8 2023, 2:34 PM

@dmgreen Thanks for reviewing the patch. Do you have any further comments ?

Thanks for the changes @hassnaa-arm, I'm satisfied with the patch now so removing my 'requesting changes'.
Unless @dmgreen has more comments on the tests, I'm happy for this patch to land.

This revision is now accepted and ready to land.Mar 9 2023, 6:14 AM

Yeah nothing else from me. LGTM, thanks for the changes.

This revision was landed with ongoing or failed builds.Mar 13 2023, 12:01 PM

Closed by commit rG40a51e1afce9: [AArch64][SVE]: custom lower AVGFloor/AVGCeil. (authored by Hassnaa Hamdi <hassnaa.hamdi@arm.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

Hassnaa Hamdi <hassnaa.hamdi@arm.com> added a commit: rG40a51e1afce9: [AArch64][SVE]: custom lower AVGFloor/AVGCeil..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

84 lines

test/

CodeGen/

AArch64/

sve-avg_floor_ceil.ll

277 lines

Diff 499568

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 1,081 Lines • ▼ Show 20 Lines	private:
SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_LOAD_SUB(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_LOAD_SUB(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_LOAD_AND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_LOAD_AND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerWindowsDYNAMIC_STACKALLOC(SDValue Op, SDValue Chain,		SDValue LowerWindowsDYNAMIC_STACKALLOC(SDValue Op, SDValue Chain,
SDValue &Size,		SDValue &Size,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
		SDValue LowerAVGFloor_AVGCeil(SDValue Node, SelectionDAG &DAG) const;

SDValue LowerFixedLengthVectorIntDivideToSVE(SDValue Op,		SDValue LowerFixedLengthVectorIntDivideToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorIntExtendToSVE(SDValue Op,		SDValue LowerFixedLengthVectorIntExtendToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorLoadToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthVectorLoadToSVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorMLoadToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthVectorMLoadToSVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECREDUCE_SEQ_FADD(SDValue ScalarOp, SelectionDAG &DAG) const;		SDValue LowerVECREDUCE_SEQ_FADD(SDValue ScalarOp, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,283 Lines • ▼ Show 20 Lines	for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
setOperationAction(ISD::UADDSAT, VT, Legal);		setOperationAction(ISD::UADDSAT, VT, Legal);
setOperationAction(ISD::SSUBSAT, VT, Legal);		setOperationAction(ISD::SSUBSAT, VT, Legal);
setOperationAction(ISD::USUBSAT, VT, Legal);		setOperationAction(ISD::USUBSAT, VT, Legal);
setOperationAction(ISD::UREM, VT, Expand);		setOperationAction(ISD::UREM, VT, Expand);
setOperationAction(ISD::SREM, VT, Expand);		setOperationAction(ISD::SREM, VT, Expand);
setOperationAction(ISD::SDIVREM, VT, Expand);		setOperationAction(ISD::SDIVREM, VT, Expand);
setOperationAction(ISD::UDIVREM, VT, Expand);		setOperationAction(ISD::UDIVREM, VT, Expand);

if (Subtarget->hasSVE2()) {
setOperationAction(ISD::AVGFLOORS, VT, Custom);		setOperationAction(ISD::AVGFLOORS, VT, Custom);
setOperationAction(ISD::AVGFLOORU, VT, Custom);		setOperationAction(ISD::AVGFLOORU, VT, Custom);
setOperationAction(ISD::AVGCEILS, VT, Custom);		setOperationAction(ISD::AVGCEILS, VT, Custom);
setOperationAction(ISD::AVGCEILU, VT, Custom);		setOperationAction(ISD::AVGCEILU, VT, Custom);
}		}
}

// Illegal unpacked integer vector types.		// Illegal unpacked integer vector types.
for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {		for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
}		}

// Legalize unpacked bitcasts to REINTERPRET_CAST.		// Legalize unpacked bitcasts to REINTERPRET_CAST.
▲ Show 20 Lines • Show All 4,777 Lines • ▼ Show 20 Lines	case ISD::VSELECT:
return LowerFixedLengthVectorSelectToSVE(Op, DAG);		return LowerFixedLengthVectorSelectToSVE(Op, DAG);
case ISD::ABS:		case ISD::ABS:
return LowerABS(Op, DAG);		return LowerABS(Op, DAG);
case ISD::ABDS:		case ISD::ABDS:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDS_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDS_PRED);
case ISD::ABDU:		case ISD::ABDU:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDU_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDU_PRED);
case ISD::AVGFLOORS:		case ISD::AVGFLOORS:
		if (Subtarget->hasSVE2())
return LowerToPredicatedOp(Op, DAG, AArch64ISD::HADDS_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::HADDS_PRED);
		return LowerAVGFloor_AVGCeil(Op, DAG);
		dmgreenUnsubmitted Done Reply Inline Actions This can be moved into the start of LowerAVG to make this function a little more regular. (I know it's not super regular already, but other parts already do the same thing). dmgreen: This can be moved into the start of LowerAVG to make this function a little more regular. (I…
case ISD::AVGFLOORU:		case ISD::AVGFLOORU:
		if (Subtarget->hasSVE2())
return LowerToPredicatedOp(Op, DAG, AArch64ISD::HADDU_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::HADDU_PRED);
		return LowerAVGFloor_AVGCeil(Op, DAG);
case ISD::AVGCEILS:		case ISD::AVGCEILS:
		if (Subtarget->hasSVE2())
return LowerToPredicatedOp(Op, DAG, AArch64ISD::RHADDS_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::RHADDS_PRED);
		return LowerAVGFloor_AVGCeil(Op, DAG);
case ISD::AVGCEILU:		case ISD::AVGCEILU:
		if (Subtarget->hasSVE2())
return LowerToPredicatedOp(Op, DAG, AArch64ISD::RHADDU_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::RHADDU_PRED);
		return LowerAVGFloor_AVGCeil(Op, DAG);
case ISD::BITREVERSE:		case ISD::BITREVERSE:
return LowerBitreverse(Op, DAG);		return LowerBitreverse(Op, DAG);
case ISD::BSWAP:		case ISD::BSWAP:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);
case ISD::CTLZ:		case ISD::CTLZ:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);
case ISD::CTTZ:		case ISD::CTTZ:
return LowerCTTZ(Op, DAG);		return LowerCTTZ(Op, DAG);
▲ Show 20 Lines • Show All 7,266 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerWindowsDYNAMIC_STACKALLOC(
// from X15 here doesn't work at -O0, since it thinks that X15 is undefined		// from X15 here doesn't work at -O0, since it thinks that X15 is undefined
// here.		// here.

Size = DAG.getNode(ISD::SHL, dl, MVT::i64, Size,		Size = DAG.getNode(ISD::SHL, dl, MVT::i64, Size,
DAG.getConstant(4, dl, MVT::i64));		DAG.getConstant(4, dl, MVT::i64));
return Chain;		return Chain;
}		}

		SDValue AArch64TargetLowering::LowerAVGFloor_AVGCeil(SDValue Node,
		dmgreenUnsubmitted Done Reply Inline Actions It might be better to make these lambdas inside LowerAVG. The names sound fairly generic but they are in practice tied to hadd nodes. dmgreen: It might be better to make these lambdas inside LowerAVG. The names sound fairly generic but…
		sdesmalenUnsubmitted Done Reply Inline Actions Can you add a comment explaining what this does, e.g. something like When x and y are extended, lower: avgfloor(x, y) -> (x + y) >> 1 avgceil(x, y) -> (x + y + 1) >> 1 Otherwise, lower to: avgfloor(x, y) -> (x >> 1) + (y >> 1) + (x && y && 1) avgceil(x, y) -> (x >> 1) + (y >> 1) + ((x \|\| y) && 1) sdesmalen: Can you add a comment explaining what this does, e.g. something like When x and y are…
		SelectionDAG &DAG) const {
		dmgreenUnsubmitted Done Reply Inline Actions Node.getValueType()... dmgreen: Node.getValueType()...
		SDLoc dl(Node);
		SDValue OpA = Node->getOperand(0);
		dmgreenUnsubmitted Not Done Reply Inline Actions I believe this only needs 1 bit to be zero, so it could probably use Known.Zero.isSignBitSet()? There is a proof in https://alive2.llvm.org/ce/z/cEtdJa for rhadd. dmgreen: I believe this only needs 1 bit to be zero, so it could probably use Known.Zero.isSignBitSet()?
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions Is that because the sign bit will be known to be zero ONLY for zero-extend case ? hassnaa-arm: Is that because the sign bit will be known to be zero ONLY for zero-extend case ?
		dmgreenUnsubmitted Done Reply Inline Actions Oh right. I was assuming that it would use `if (!IsSigned && IsZeroExtended(OpA) && IsZeroExtended(OpB))` and `if (IsSigned && IsSignExtended(OpA) && IsSignExtended(OpB))` An hadd / rhadd is defined as converting into a arbitrary wide integer before doing the add/shift and then converting back. It just happens that it only needs 1 bit extra for that to be equivalent to any other type sizes. So we only need to check the top bit is known to be 0. (isSignBitSet doesnt really have anything to do with signedness here, its just a way of checking the top bit). dmgreen: Oh right. I was assuming that it would use `if (!IsSigned && IsZeroExtended(OpA) &&…
		SDValue OpB = Node->getOperand(1);
		dmgreenUnsubmitted Done Reply Inline Actions Can it use countTrailingOnes as opposed to countPopulation? Although I think these functions might be better removed and use computeKnownBits checks instead. dmgreen: Can it use countTrailingOnes as opposed to countPopulation? Although I think these functions…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions I think using countTrailingOnes is much simpler than computeKnownBits because the operand and its value are already known. hassnaa-arm: I think using countTrailingOnes is much simpler than computeKnownBits because the operand and…
		sdesmalenUnsubmitted Not Done Reply Inline Actions I presume that @dmgreen meant using `computeKnownBits` on `Node` as a whole, not specifically on the splatted value so that you don't need to check for the opcode explicitly. See for example `checkZExtBool`, where it does: APInt RequredZero(SizeInBits, 0xFE); KnownBits Bits = DAG.computeKnownBits(Arg, 4); bool ZExtBool = (Bits.Zero & RequredZero) == RequredZero; You can probably do a similar thing for `IsSignExtended`, but then looking at the `Bits.One` instead of `Bits.Zero`. sdesmalen: I presume that @dmgreen meant using `computeKnownBits` on `Node` as a whole, not specifically…
		EVT VT = Node->getValueType(0);
		SDValue ConstantOne = DAG.getConstant(1, dl, VT);

		assert(VT.isScalableVector() && "Only expect to lower scalable vector op!");

		sdesmalenUnsubmitted Done Reply Inline Actions Move this closer to use. sdesmalen: Move this closer to use.
		// check if it's better to emit the original code:
		sdesmalenUnsubmitted Done Reply Inline Actions Should this be using an arithmetic shift when the operation is signed? sdesmalen: Should this be using an arithmetic shift when the operation is signed?
		dmgreenUnsubmitted Done Reply Inline Actions I think this one would be better to use computeNumSignBits. It's less important whether they are 0 or 1, just that they are the same as the top bit. Again I think it needs 1 bit to be valid, so computeNumSignBits(..) > 1 should do it. https://alive2.llvm.org/ce/z/VbxRs-. It would be good to see proofs for the other bit here too where the hadd is expanded to `SRL(A) + SRL(B) + (A&B)&1` dmgreen: I think this one would be better to use computeNumSignBits. It's less important whether they…
		if (OpA.getOpcode() == ISD::AND) {
		// in this case omiting the original code is better than custom lowering
		sdesmalenUnsubmitted Done Reply Inline Actions nit: unnecessary newline. Also, these lines exceed the 80char limit, so please run clang-format on this code. sdesmalen: nit: unnecessary newline. Also, these lines exceed the 80char limit, so please run clang…
		// AVGFloor/Ceil
		APInt ApintTemp;
		if (ISD::isConstantSplatVector(OpA.getOperand(1).getNode(), ApintTemp)) {
		sdesmalenUnsubmitted Done Reply Inline Actions This is missing a check to ensure that the `and` value is a mask, and that the masked 'type' is smaller than the size of the current type. Perhaps you can get some inspiration from `isExtendOrShiftOperand` which checks whether the operation is zero or sign-extended. In either case, it would be nice to have this logic in separate `isSignedExtended()` and `isZeroExtended()` functions. sdesmalen: This is missing a check to ensure that the `and` value is a mask, and that the masked 'type' is…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions I used DAG.ComputeNumSignBits instead of computeKnownBits, because computeKnownBits doesn't get any known bits for SIGN_EXTEND_INREG hassnaa-arm: I used DAG.ComputeNumSignBits instead of computeKnownBits, because computeKnownBits doesn't get…
		SDValue Add = DAG.getNode(ISD::ADD, dl, VT, OpA, OpB);

		sdesmalenUnsubmitted Done Reply Inline Actions You seem to be trying to optimise the case where one of the operands is a vector of ones, but I don't see any motivating tests for this? I would also expect these cases to be folded away already by existing DAGCombines. sdesmalen: You seem to be trying to optimise the case where one of the operands is a vector of ones, but I…
		if (Node->getOpcode() == ISD::AVGCEILU)
		sdesmalenUnsubmitted Done Reply Inline Actions There are several uses of Node->getOpcode(), you can move that to a separate variable. It would also be nice to have some variables: IsSigned (for AVGFLOORS and AVGCEILS) IsFloor (for AVGFLOORS and AVGFLOORU) ShiftOpc = IsSigned ? ISD::SRA : ISD::SRL; That way you can simplify the code a bit, without having to check the opcodes everywhere, and you can combine some of the if/else branches. sdesmalen: There are several uses of Node->getOpcode(), you can move that to a separate variable. It…
		dmgreenUnsubmitted Done Reply Inline Actions This can just be called LowerAVG dmgreen: This can just be called LowerAVG
		Add = DAG.getNode(ISD::ADD, dl, VT, Add, ConstantOne);

		return DAG.getNode(ISD::SRL, dl, VT, Add, ConstantOne);
		}
		}

		else if (OpA.getOpcode() == ISD::SIGN_EXTEND_INREG) {
		sdesmalenUnsubmitted Done Reply Inline Actions You can remove this condition, it's covered by DAG.ComputeNumSignBits. sdesmalen: You can remove this condition, it's covered by DAG.ComputeNumSignBits.
		// in this case omiting the original code is better than custom lowering
		// AVGFloor/Ceil
		SDValue Add = DAG.getNode(ISD::ADD, dl, VT, OpA, OpB);

		if (Node->getOpcode() == ISD::AVGCEILS)
		sdesmalenUnsubmitted Done Reply Inline Actions nit: this comment would be redundant if you address my other comment (to add a comment for the function itself) sdesmalen: nit: this comment would be redundant if you address my other comment (to add a comment for the…
		Add = DAG.getNode(ISD::ADD, dl, VT, Add, ConstantOne);

		return DAG.getNode(ISD::SRA, dl, VT, Add, ConstantOne);
		}
		dmgreenUnsubmitted Done Reply Inline Actions I would probably do: unsigned Opc = Op->getOpcode(); bool IsCeil = Opc == ISD::AVGCEILS \|\| Opc == ISD::AVGCEILU; bool IsSigned = Opc == ISD::AVGFLOORS \|\| Op->getOpcode() == ISD::AVGCEILS; dmgreen: I would probably do: ``` unsigned Opc = Op->getOpcode(); bool IsCeil = Opc == ISD::AVGCEILS \|\|…

		SDValue ShiftOpA, ShiftOpB;
		if (Node->getOpcode() == ISD::AVGFLOORS \|\|
		Node->getOpcode() == ISD::AVGCEILS) {
		ShiftOpA = DAG.getNode(ISD::SRA, dl, VT, OpA, ConstantOne);
		sdesmalenUnsubmitted Done Reply Inline Actions nit: SDValue ShiftOpA = DAG.getNode(ShiftOpc, dl, VT, OpA, ConstantOne); SDValue ShiftOpB = DAG.getNode(ShiftOpc, dl, VT, OpB, ConstantOne); sdesmalen: nit: SDValue ShiftOpA = DAG.getNode(ShiftOpc, dl, VT, OpA, ConstantOne); SDValue ShiftOpB…
		ShiftOpB = DAG.getNode(ISD::SRA, dl, VT, OpB, ConstantOne);
		dmgreenUnsubmitted Done Reply Inline Actions check should be Check dmgreen: check should be Check
		} else {
		dmgreenUnsubmitted Not Done Reply Inline Actions Why is it OK to only check one of the operands? dmgreen: Why is it OK to only check one of the operands?
		dmgreenUnsubmitted Done Reply Inline Actions &&, not \|\|, I would expect. A HADD is equivalent to `trunc(shr(add(ext(x),ext(y)), 1))`, it is not directly equivalent to `shr(add(x,y), 1)`. So we need to prove that turning it into the simpler version add+shift is better. It's not really "emit the original code". dmgreen: &&, not \|\|, I would expect. A HADD is equivalent to `trunc(shr(add(ext(x),ext(y)), 1))`, it is…
		ShiftOpA = DAG.getNode(ISD::SRL, dl, VT, OpA, ConstantOne);
		ShiftOpB = DAG.getNode(ISD::SRL, dl, VT, OpB, ConstantOne);
		}

		SDValue tmp;
		if (Node->getOpcode() == ISD::AVGFLOORU \|\|
		sdesmalenUnsubmitted Done Reply Inline Actions You can combine these cases doing something like this: if ((IsSigned && IsSignExtended(OpA) && IsSignExtended(OpB)) \|\| (!IsSigned && IsZeroExtended(OpA) && IsZeroExtended(OpB))) { ... unsigned ShiftOpc = IsSigned ? ISD::SRA : ISD::SRL; return DAG.getNode(ShiftOpc, dl, VT, Add, ConstantOne); } sdesmalen: You can combine these cases doing something like this: if ((IsSigned && IsSignExtended(OpA)…
		Node->getOpcode() == ISD::AVGFLOORS)
		tmp = DAG.getNode(ISD::AND, dl, VT, OpA, OpB);
		else
		dmgreenUnsubmitted Done Reply Inline Actions Doesn't need the else after the return, which might help simplify if this is checking the sign bits. dmgreen: Doesn't need the else after the return, which might help simplify if this is checking the sign…
		tmp = DAG.getNode(ISD::OR, dl, VT, OpA, OpB);

		tmp = DAG.getNode(ISD::AND, dl, VT, tmp, ConstantOne);
		SDValue Add = DAG.getNode(ISD::ADD, dl, VT, ShiftOpA, ShiftOpB);
		return DAG.getNode(ISD::ADD, dl, VT, Add, tmp);
		}

		sdesmalenUnsubmitted Done Reply Inline Actions nit: SDValue Tmp = DAG.getNode(IsCeil ? ISD::OR : ISD::AND, dl, VT, OpA, OpB); sdesmalen: nit: SDValue Tmp = DAG.getNode(IsCeil ? ISD::OR : ISD::AND, dl, VT, OpA, OpB);
SDValue		SDValue
AArch64TargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op,		AArch64TargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
assert(Subtarget->isTargetWindows() &&		assert(Subtarget->isTargetWindows() &&
"Only Windows alloca probing supported");		"Only Windows alloca probing supported");
SDLoc dl(Op);		SDLoc dl(Op);
// Get the inputs.		// Get the inputs.
SDNode *Node = Op.getNode();		SDNode *Node = Op.getNode();
▲ Show 20 Lines • Show All 4,238 Lines • ▼ Show 20 Lines	if (VT.isFixedLengthVector() && VT.is64BitVector() && N0.hasOneUse() &&
N0.getOpcode() == AArch64ISD::DUP) {		N0.getOpcode() == AArch64ISD::DUP) {
SDValue Op = N0.getOperand(0);		SDValue Op = N0.getOperand(0);
if (VT.getScalarType() == MVT::i32 &&		if (VT.getScalarType() == MVT::i32 &&
N0.getOperand(0).getValueType().getScalarType() == MVT::i64)		N0.getOperand(0).getValueType().getScalarType() == MVT::i64)
Op = DAG.getNode(ISD::TRUNCATE, SDLoc(N), MVT::i32, Op);		Op = DAG.getNode(ISD::TRUNCATE, SDLoc(N), MVT::i32, Op);
return DAG.getNode(N0.getOpcode(), SDLoc(N), VT, Op);		return DAG.getNode(N0.getOpcode(), SDLoc(N), VT, Op);
}		}

return SDValue();		return SDValue();
		sdesmalenUnsubmitted Not Done Reply Inline Actions As I understand it, the point to simplify the following: trunc(lshr(add(add(ext(a), ext(b)), 1), 1)) -> lshr(a, 1) + lshr(b, 1) + (a \| b) & 1 iff the type of ext(a) is not a legal type sdesmalen: As I understand it, the point to simplify the following: trunc(lshr(add(add(ext(a), ext…
}		}

		sdesmalenUnsubmitted Not Done Reply Inline Actions I see that in llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp it tries to combine this pattern into a simpler `AVGFLOORS/AVGFLOORU/AVGCEILS/AVGCEILU` node in the function `combineShiftToAVG`. Is there a way to re-use this existing mechanism and then Custom lower this node to the desired set of instructions? I even wonder if we could generalise this code (i.e. not specific to AArch64), when the nodes are set to Expand. sdesmalen: I see that in llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp it tries to combine this pattern…
		sdesmalenUnsubmitted Not Done Reply Inline Actions You're testing for SVE2 (vector extension), but the test is using scalar types, not vector types. That doesn't seem entirely right? sdesmalen: You're testing for SVE2 (vector extension), but the test is using scalar types, not vector…
// Check an node is an extend or shift operand		// Check an node is an extend or shift operand
static bool isExtendOrShiftOperand(SDValue N) {		static bool isExtendOrShiftOperand(SDValue N) {
unsigned Opcode = N.getOpcode();		unsigned Opcode = N.getOpcode();
if (Opcode == ISD::SIGN_EXTEND \|\| Opcode == ISD::SIGN_EXTEND_INREG \|\|		if (Opcode == ISD::SIGN_EXTEND \|\| Opcode == ISD::SIGN_EXTEND_INREG \|\|
Opcode == ISD::ZERO_EXTEND \|\| Opcode == ISD::ANY_EXTEND) {		Opcode == ISD::ZERO_EXTEND \|\| Opcode == ISD::ANY_EXTEND) {
EVT SrcVT;		EVT SrcVT;
if (Opcode == ISD::SIGN_EXTEND_INREG)		if (Opcode == ISD::SIGN_EXTEND_INREG)
SrcVT = cast<VTSDNode>(N.getOperand(1))->getVT();		SrcVT = cast<VTSDNode>(N.getOperand(1))->getVT();
▲ Show 20 Lines • Show All 959 Lines • ▼ Show 20 Lines	if (!DCI.isBeforeLegalizeOps() && N->getOpcode() == ISD::ZERO_EXTEND &&
return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), N->getValueType(0), NewABD);		return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), N->getValueType(0), NewABD);
}		}

if (N->getValueType(0).isFixedLengthVector() &&		if (N->getValueType(0).isFixedLengthVector() &&
N->getOpcode() == ISD::SIGN_EXTEND &&		N->getOpcode() == ISD::SIGN_EXTEND &&
N->getOperand(0)->getOpcode() == ISD::SETCC)		N->getOperand(0)->getOpcode() == ISD::SETCC)
return performSignExtendSetCCCombine(N, DCI, DAG);		return performSignExtendSetCCCombine(N, DCI, DAG);

return SDValue();		return SDValue();
		sdesmalenUnsubmitted Not Done Reply Inline Actions I don't believe this is a transform we want to make. sdesmalen: I don't believe this is a transform we want to make.
}		}

static SDValue splitStoreSplat(SelectionDAG &DAG, StoreSDNode &St,		static SDValue splitStoreSplat(SelectionDAG &DAG, StoreSDNode &St,
SDValue SplatVal, unsigned NumVecElts) {		SDValue SplatVal, unsigned NumVecElts) {
assert(!St.isTruncatingStore() && "cannot split truncating vector store");		assert(!St.isTruncatingStore() && "cannot split truncating vector store");
Align OrigAlignment = St.getAlign();		Align OrigAlignment = St.getAlign();
unsigned EltOffset = SplatVal.getValueType().getSizeInBits() / 8;		unsigned EltOffset = SplatVal.getValueType().getSizeInBits() / 8;

▲ Show 20 Lines • Show All 5,839 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
		sdesmalenUnsubmitted Done Reply Inline Actions The filename has both a dash (`-`) and underscores (`_`). Can you rename it to sve-avg-floor-ceil.ll ? sdesmalen: The filename has both a dash (`-`) and underscores (`_`). Can you rename it to sve-avg-floor…
; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mcpu=neoverse-v1 \| FileCheck %s		; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mcpu=neoverse-v1 \| FileCheck %s
		sdesmalenUnsubmitted Done Reply Inline Actions Can you add two RUN lines, one for -mattr=+sve and one for -mattr=+sve2 ? sdesmalen: Can you add two RUN lines, one for -mattr=+sve and one for -mattr=+sve2 ?
;
		define <vscale x 2 x i64> @hadds_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
		sdesmalenUnsubmitted Not Done Reply Inline Actions Can you add these tests to the precursory patch, so that this patch only shows the diff? sdesmalen: Can you add these tests to the precursory patch, so that this patch only shows the diff?
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions Adding them cause a crash. Crash message is: "LLVM ERROR: Scalarization of scalable vectors is not supported." It crashes while legalising the result type of nxv2i128 hassnaa-arm: Adding them cause a crash. Crash message is: "LLVM ERROR: Scalarization of scalable vectors is…
		dmgreenUnsubmitted Not Done Reply Inline Actions Like Eli said it would be good to fix vscale x 1 vectorization so this wasn't a problem any more. dmgreen: Like Eli said it would be good to fix vscale x 1 vectorization so this wasn't a problem any…
		; CHECK-LABEL: hadds_v2i64:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: asr z2.d, z1.d, #1
		; CHECK-NEXT: asr z3.d, z0.d, #1
		; CHECK-NEXT: and z0.d, z0.d, z1.d
		; CHECK-NEXT: add z2.d, z3.d, z2.d
		; CHECK-NEXT: and z0.d, z0.d, #0x1
		; CHECK-NEXT: add z0.d, z2.d, z0.d
		; CHECK-NEXT: ret
		entry:
		%s0s = sext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
		%s1s = sext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
		%m = add nsw <vscale x 2 x i128> %s0s, %s1s
		%s = ashr <vscale x 2 x i128> %m, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%s2 = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
		ret <vscale x 2 x i64> %s2
		}

		define <vscale x 2 x i64> @haddu_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
		; CHECK-LABEL: haddu_v2i64:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: lsr z2.d, z1.d, #1
		; CHECK-NEXT: lsr z3.d, z0.d, #1
		; CHECK-NEXT: and z0.d, z0.d, z1.d
		; CHECK-NEXT: add z2.d, z3.d, z2.d
		; CHECK-NEXT: and z0.d, z0.d, #0x1
		; CHECK-NEXT: add z0.d, z2.d, z0.d
		; CHECK-NEXT: ret
		entry:
		%s0s = zext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
		%s1s = zext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
		%m = add nuw nsw <vscale x 2 x i128> %s0s, %s1s
		%s = lshr <vscale x 2 x i128> %m, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%s2 = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
		ret <vscale x 2 x i64> %s2
		}

define <vscale x 2 x i32> @hadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {		define <vscale x 2 x i32> @hadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
; CHECK-LABEL: hadds_v2i32:		; CHECK-LABEL: hadds_v2i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: sxtw z0.d, p0/m, z0.d		; CHECK-NEXT: sxtw z0.d, p0/m, z0.d
; CHECK-NEXT: adr z0.d, [z0.d, z1.d, sxtw]		; CHECK-NEXT: adr z0.d, [z0.d, z1.d, sxtw]
; CHECK-NEXT: asr z0.d, z0.d, #1		; CHECK-NEXT: asr z0.d, z0.d, #1
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>		%s0s = sext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
%s1s = sext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>		%s1s = sext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
%m = add nsw <vscale x 2 x i64> %s0s, %s1s		%m = add nsw <vscale x 2 x i64> %s0s, %s1s
%s = ashr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)		%s = ashr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
%s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>		%s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
ret <vscale x 2 x i32> %s2		ret <vscale x 2 x i32> %s2
}		}

define <vscale x 2 x i32> @haddu_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {		define <vscale x 2 x i32> @haddu_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
; CHECK-LABEL: haddu_v2i32:		; CHECK-LABEL: haddu_v2i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: and z0.d, z0.d, #0xffffffff		; CHECK-NEXT: and z0.d, z0.d, #0xffffffff
; CHECK-NEXT: adr z0.d, [z0.d, z1.d, uxtw]		; CHECK-NEXT: adr z0.d, [z0.d, z1.d, uxtw]
; CHECK-NEXT: lsr z0.d, z0.d, #1		; CHECK-NEXT: lsr z0.d, z0.d, #1
		sdesmalenUnsubmitted Not Done Reply Inline Actions This doesn't look like an improvement, we don't really want to do this transform if it makes the resulting code worse. Do you know why this results in worse code? sdesmalen: This doesn't look like an improvement, we don't really want to do this transform if it makes…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions In the new changes, the generated instructions are exactly the equivalent for AVGFloor, no additional instructions. I think nothing can be done for this case. hassnaa-arm: In the new changes, the generated instructions are exactly the equivalent for AVGFloor, no…
		sdesmalenUnsubmitted Done Reply Inline Actions For these cases (where the input is an unpacked vector that is promoted using an `and` (for the unsigned case) or `sign_extend_inreg` (for the signed case)) we can emit the original code when Custom lowering these nodes. That way we can mitigate the regression. sdesmalen: For these cases (where the input is an unpacked vector that is promoted using an `and` (for the…
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>		%s0s = zext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
%s1s = zext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>		%s1s = zext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
%m = add nuw nsw <vscale x 2 x i64> %s0s, %s1s		%m = add nuw nsw <vscale x 2 x i64> %s0s, %s1s
%s = lshr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)		%s = lshr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
%s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>		%s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
ret <vscale x 2 x i32> %s2		ret <vscale x 2 x i32> %s2
}		}

define <vscale x 4 x i32> @hadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {		define <vscale x 4 x i32> @hadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
; CHECK-LABEL: hadds_v4i32:		; CHECK-LABEL: hadds_v4i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: sunpkhi z2.d, z0.s		; CHECK-NEXT: asr z2.s, z1.s, #1
; CHECK-NEXT: sunpklo z0.d, z0.s		; CHECK-NEXT: asr z3.s, z0.s, #1
; CHECK-NEXT: sunpkhi z3.d, z1.s		; CHECK-NEXT: and z0.d, z0.d, z1.d
; CHECK-NEXT: sunpklo z1.d, z1.s		; CHECK-NEXT: add z2.s, z3.s, z2.s
; CHECK-NEXT: add z0.d, z0.d, z1.d		; CHECK-NEXT: and z0.s, z0.s, #0x1
; CHECK-NEXT: add z1.d, z2.d, z3.d		; CHECK-NEXT: add z0.s, z2.s, z0.s
; CHECK-NEXT: lsr z1.d, z1.d, #1
; CHECK-NEXT: lsr z0.d, z0.d, #1
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>		%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>		%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
%m = add nsw <vscale x 4 x i64> %s0s, %s1s		%m = add nsw <vscale x 4 x i64> %s0s, %s1s
%s = ashr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%s = ashr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>		%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
ret <vscale x 4 x i32> %s2		ret <vscale x 4 x i32> %s2
}		}

define <vscale x 4 x i32> @haddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {		define <vscale x 4 x i32> @haddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
; CHECK-LABEL: haddu_v4i32:		; CHECK-LABEL: haddu_v4i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uunpkhi z2.d, z0.s		; CHECK-NEXT: lsr z2.s, z1.s, #1
; CHECK-NEXT: uunpklo z0.d, z0.s		; CHECK-NEXT: lsr z3.s, z0.s, #1
; CHECK-NEXT: uunpkhi z3.d, z1.s		; CHECK-NEXT: and z0.d, z0.d, z1.d
; CHECK-NEXT: uunpklo z1.d, z1.s		; CHECK-NEXT: add z2.s, z3.s, z2.s
; CHECK-NEXT: add z0.d, z0.d, z1.d		; CHECK-NEXT: and z0.s, z0.s, #0x1
; CHECK-NEXT: add z1.d, z2.d, z3.d		; CHECK-NEXT: add z0.s, z2.s, z0.s
; CHECK-NEXT: lsr z1.d, z1.d, #1
; CHECK-NEXT: lsr z0.d, z0.d, #1
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>		%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>		%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
%m = add nuw nsw <vscale x 4 x i64> %s0s, %s1s		%m = add nuw nsw <vscale x 4 x i64> %s0s, %s1s
%s = lshr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%s = lshr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>		%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
ret <vscale x 4 x i32> %s2		ret <vscale x 4 x i32> %s2
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
; CHECK-NEXT: sxth z1.s, p0/m, z1.s		; CHECK-NEXT: sxth z1.s, p0/m, z1.s
; CHECK-NEXT: add z0.s, z0.s, z1.s		; CHECK-NEXT: add z0.s, z0.s, z1.s
; CHECK-NEXT: asr z0.s, z0.s, #1		; CHECK-NEXT: asr z0.s, z0.s, #1
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>		%s0s = sext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
%s1s = sext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>		%s1s = sext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
%m = add nsw <vscale x 4 x i32> %s0s, %s1s		%m = add nsw <vscale x 4 x i32> %s0s, %s1s
%s = ashr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)		%s = ashr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
		sdesmalenUnsubmitted Done Reply Inline Actions Should this use `ashr` (arithmetic shift, since the values s0s/s1s are signed)? Same question for other tests in this file. sdesmalen: Should this use `ashr` (arithmetic shift, since the values s0s/s1s are signed)? Same question…
%s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>		%s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
ret <vscale x 4 x i16> %s2		ret <vscale x 4 x i16> %s2
}		}

define <vscale x 4 x i16> @haddu_v4i16(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {		define <vscale x 4 x i16> @haddu_v4i16(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {
; CHECK-LABEL: haddu_v4i16:		; CHECK-LABEL: haddu_v4i16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: and z0.s, z0.s, #0xffff		; CHECK-NEXT: and z0.s, z0.s, #0xffff
; CHECK-NEXT: and z1.s, z1.s, #0xffff		; CHECK-NEXT: and z1.s, z1.s, #0xffff
; CHECK-NEXT: add z0.s, z0.s, z1.s		; CHECK-NEXT: add z0.s, z0.s, z1.s
; CHECK-NEXT: lsr z0.s, z0.s, #1		; CHECK-NEXT: lsr z0.s, z0.s, #1
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>		%s0s = zext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
%s1s = zext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>		%s1s = zext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
%m = add nuw nsw <vscale x 4 x i32> %s0s, %s1s		%m = add nuw nsw <vscale x 4 x i32> %s0s, %s1s
%s = lshr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)		%s = lshr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
%s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>		%s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
ret <vscale x 4 x i16> %s2		ret <vscale x 4 x i16> %s2
}		}

define <vscale x 8 x i16> @hadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {		define <vscale x 8 x i16> @hadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
; CHECK-LABEL: hadds_v8i16:		; CHECK-LABEL: hadds_v8i16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: sunpkhi z2.s, z0.h		; CHECK-NEXT: asr z2.h, z1.h, #1
; CHECK-NEXT: sunpklo z0.s, z0.h		; CHECK-NEXT: asr z3.h, z0.h, #1
; CHECK-NEXT: sunpkhi z3.s, z1.h		; CHECK-NEXT: and z0.d, z0.d, z1.d
; CHECK-NEXT: sunpklo z1.s, z1.h		; CHECK-NEXT: add z2.h, z3.h, z2.h
; CHECK-NEXT: add z0.s, z0.s, z1.s		; CHECK-NEXT: and z0.h, z0.h, #0x1
; CHECK-NEXT: add z1.s, z2.s, z3.s		; CHECK-NEXT: add z0.h, z2.h, z0.h
; CHECK-NEXT: lsr z1.s, z1.s, #1
; CHECK-NEXT: lsr z0.s, z0.s, #1
; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>		%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>		%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
%m = add nsw <vscale x 8 x i32> %s0s, %s1s		%m = add nsw <vscale x 8 x i32> %s0s, %s1s
%s = ashr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%s = ashr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>		%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
ret <vscale x 8 x i16> %s2		ret <vscale x 8 x i16> %s2
}		}

define <vscale x 8 x i16> @haddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {		define <vscale x 8 x i16> @haddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
; CHECK-LABEL: haddu_v8i16:		; CHECK-LABEL: haddu_v8i16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uunpkhi z2.s, z0.h		; CHECK-NEXT: lsr z2.h, z1.h, #1
; CHECK-NEXT: uunpklo z0.s, z0.h		; CHECK-NEXT: lsr z3.h, z0.h, #1
; CHECK-NEXT: uunpkhi z3.s, z1.h		; CHECK-NEXT: and z0.d, z0.d, z1.d
; CHECK-NEXT: uunpklo z1.s, z1.h		; CHECK-NEXT: add z2.h, z3.h, z2.h
; CHECK-NEXT: add z0.s, z0.s, z1.s		; CHECK-NEXT: and z0.h, z0.h, #0x1
; CHECK-NEXT: add z1.s, z2.s, z3.s		; CHECK-NEXT: add z0.h, z2.h, z0.h
; CHECK-NEXT: lsr z1.s, z1.s, #1
; CHECK-NEXT: lsr z0.s, z0.s, #1
; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>		%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>		%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
%m = add nuw nsw <vscale x 8 x i32> %s0s, %s1s		%m = add nuw nsw <vscale x 8 x i32> %s0s, %s1s
%s = lshr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%s = lshr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>		%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
ret <vscale x 8 x i16> %s2		ret <vscale x 8 x i16> %s2
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	entry:
%s = lshr <vscale x 8 x i16> %m, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)		%s = lshr <vscale x 8 x i16> %m, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
%s2 = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>		%s2 = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
ret <vscale x 8 x i8> %s2		ret <vscale x 8 x i8> %s2
}		}

define <vscale x 16 x i8> @hadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {		define <vscale x 16 x i8> @hadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
; CHECK-LABEL: hadds_v16i8:		; CHECK-LABEL: hadds_v16i8:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: sunpkhi z2.h, z0.b		; CHECK-NEXT: asr z2.b, z1.b, #1
; CHECK-NEXT: sunpklo z0.h, z0.b		; CHECK-NEXT: asr z3.b, z0.b, #1
; CHECK-NEXT: sunpkhi z3.h, z1.b		; CHECK-NEXT: and z0.d, z0.d, z1.d
; CHECK-NEXT: sunpklo z1.h, z1.b		; CHECK-NEXT: add z2.b, z3.b, z2.b
; CHECK-NEXT: add z0.h, z0.h, z1.h		; CHECK-NEXT: and z0.b, z0.b, #0x1
; CHECK-NEXT: add z1.h, z2.h, z3.h		; CHECK-NEXT: add z0.b, z2.b, z0.b
; CHECK-NEXT: lsr z1.h, z1.h, #1
; CHECK-NEXT: lsr z0.h, z0.h, #1
; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>		%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>		%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
%m = add nsw <vscale x 16 x i16> %s0s, %s1s		%m = add nsw <vscale x 16 x i16> %s0s, %s1s
%s = ashr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%s = ashr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>		%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
ret <vscale x 16 x i8> %s2		ret <vscale x 16 x i8> %s2
}		}

define <vscale x 16 x i8> @haddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {		define <vscale x 16 x i8> @haddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
; CHECK-LABEL: haddu_v16i8:		; CHECK-LABEL: haddu_v16i8:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uunpkhi z2.h, z0.b		; CHECK-NEXT: lsr z2.b, z1.b, #1
; CHECK-NEXT: uunpklo z0.h, z0.b		; CHECK-NEXT: lsr z3.b, z0.b, #1
; CHECK-NEXT: uunpkhi z3.h, z1.b		; CHECK-NEXT: and z0.d, z0.d, z1.d
; CHECK-NEXT: uunpklo z1.h, z1.b		; CHECK-NEXT: add z2.b, z3.b, z2.b
; CHECK-NEXT: add z0.h, z0.h, z1.h		; CHECK-NEXT: and z0.b, z0.b, #0x1
; CHECK-NEXT: add z1.h, z2.h, z3.h		; CHECK-NEXT: add z0.b, z2.b, z0.b
; CHECK-NEXT: lsr z1.h, z1.h, #1
; CHECK-NEXT: lsr z0.h, z0.h, #1
; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>		%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>		%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
%m = add nuw nsw <vscale x 16 x i16> %s0s, %s1s		%m = add nuw nsw <vscale x 16 x i16> %s0s, %s1s
%s = lshr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%s = lshr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>		%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
ret <vscale x 16 x i8> %s2		ret <vscale x 16 x i8> %s2
}		}

		define <vscale x 2 x i64> @rhadds_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
		; CHECK-LABEL: rhadds_v2i64:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: asr z2.d, z1.d, #1
		; CHECK-NEXT: asr z3.d, z0.d, #1
		; CHECK-NEXT: orr z0.d, z0.d, z1.d
		; CHECK-NEXT: add z2.d, z3.d, z2.d
		; CHECK-NEXT: and z0.d, z0.d, #0x1
		; CHECK-NEXT: add z0.d, z2.d, z0.d
		; CHECK-NEXT: ret
		entry:
		%s0s = sext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
		%s1s = sext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
		%add = add <vscale x 2 x i128> %s0s, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%add2 = add <vscale x 2 x i128> %add, %s1s
		%s = ashr <vscale x 2 x i128> %add2, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%result = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
		ret <vscale x 2 x i64> %result
		}

		define <vscale x 2 x i64> @rhaddu_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
		; CHECK-LABEL: rhaddu_v2i64:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: lsr z2.d, z1.d, #1
		; CHECK-NEXT: lsr z3.d, z0.d, #1
		; CHECK-NEXT: orr z0.d, z0.d, z1.d
		; CHECK-NEXT: add z2.d, z3.d, z2.d
		; CHECK-NEXT: and z0.d, z0.d, #0x1
		; CHECK-NEXT: add z0.d, z2.d, z0.d
		; CHECK-NEXT: ret
		entry:
		%s0s = zext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
		%s1s = zext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
		%add = add nuw nsw <vscale x 2 x i128> %s0s, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%add2 = add nuw nsw <vscale x 2 x i128> %add, %s1s
		%s = lshr <vscale x 2 x i128> %add2, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%result = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
		ret <vscale x 2 x i64> %result
		}

define <vscale x 2 x i32> @rhadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {		define <vscale x 2 x i32> @rhadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
; CHECK-LABEL: rhadds_v2i32:		; CHECK-LABEL: rhadds_v2i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff		; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
; CHECK-NEXT: sxtw z0.d, p0/m, z0.d		; CHECK-NEXT: sxtw z0.d, p0/m, z0.d
; CHECK-NEXT: sxtw z1.d, p0/m, z1.d		; CHECK-NEXT: sxtw z1.d, p0/m, z1.d
; CHECK-NEXT: eor z0.d, z0.d, z2.d		; CHECK-NEXT: eor z0.d, z0.d, z2.d
Show All 28 Lines	entry:
%s = lshr <vscale x 2 x i64> %add2, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)		%s = lshr <vscale x 2 x i64> %add2, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
%result = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>		%result = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
ret <vscale x 2 x i32> %result		ret <vscale x 2 x i32> %result
}		}

define <vscale x 4 x i32> @rhadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {		define <vscale x 4 x i32> @rhadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
; CHECK-LABEL: rhadds_v4i32:		; CHECK-LABEL: rhadds_v4i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: sunpkhi z2.d, z0.s		; CHECK-NEXT: asr z2.s, z1.s, #1
; CHECK-NEXT: sunpklo z0.d, z0.s		; CHECK-NEXT: asr z3.s, z0.s, #1
; CHECK-NEXT: mov z4.d, #-1 // =0xffffffffffffffff		; CHECK-NEXT: orr z0.d, z0.d, z1.d
; CHECK-NEXT: sunpkhi z3.d, z1.s		; CHECK-NEXT: add z2.s, z3.s, z2.s
; CHECK-NEXT: sunpklo z1.d, z1.s		; CHECK-NEXT: and z0.s, z0.s, #0x1
; CHECK-NEXT: eor z0.d, z0.d, z4.d		; CHECK-NEXT: add z0.s, z2.s, z0.s
; CHECK-NEXT: sub z0.d, z1.d, z0.d
; CHECK-NEXT: eor z1.d, z2.d, z4.d
; CHECK-NEXT: sub z1.d, z3.d, z1.d
; CHECK-NEXT: lsr z0.d, z0.d, #1
; CHECK-NEXT: lsr z1.d, z1.d, #1
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>		%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>		%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
%add = add <vscale x 4 x i64> %s0s, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%add = add <vscale x 4 x i64> %s0s, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%add2 = add <vscale x 4 x i64> %add, %s1s		%add2 = add <vscale x 4 x i64> %add, %s1s
%s = ashr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%s = ashr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>		%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
ret <vscale x 4 x i32> %result		ret <vscale x 4 x i32> %result
}		}

define <vscale x 4 x i32> @rhaddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {		define <vscale x 4 x i32> @rhaddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
; CHECK-LABEL: rhaddu_v4i32:		; CHECK-LABEL: rhaddu_v4i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uunpkhi z2.d, z0.s		; CHECK-NEXT: lsr z2.s, z1.s, #1
; CHECK-NEXT: uunpklo z0.d, z0.s		; CHECK-NEXT: lsr z3.s, z0.s, #1
; CHECK-NEXT: mov z4.d, #-1 // =0xffffffffffffffff		; CHECK-NEXT: orr z0.d, z0.d, z1.d
; CHECK-NEXT: uunpkhi z3.d, z1.s		; CHECK-NEXT: add z2.s, z3.s, z2.s
; CHECK-NEXT: uunpklo z1.d, z1.s		; CHECK-NEXT: and z0.s, z0.s, #0x1
; CHECK-NEXT: eor z0.d, z0.d, z4.d		; CHECK-NEXT: add z0.s, z2.s, z0.s
; CHECK-NEXT: sub z0.d, z1.d, z0.d
; CHECK-NEXT: eor z1.d, z2.d, z4.d
; CHECK-NEXT: sub z1.d, z3.d, z1.d
; CHECK-NEXT: lsr z0.d, z0.d, #1
; CHECK-NEXT: lsr z1.d, z1.d, #1
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>		%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>		%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
%add = add nuw nsw <vscale x 4 x i64> %s0s, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%add = add nuw nsw <vscale x 4 x i64> %s0s, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%add2 = add nuw nsw <vscale x 4 x i64> %add, %s1s		%add2 = add nuw nsw <vscale x 4 x i64> %add, %s1s
%s = lshr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%s = lshr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>		%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	entry:
%s = lshr <vscale x 4 x i32> %add2, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)		%s = lshr <vscale x 4 x i32> %add2, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
%result = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>		%result = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
ret <vscale x 4 x i16> %result		ret <vscale x 4 x i16> %result
}		}

define <vscale x 8 x i16> @rhadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {		define <vscale x 8 x i16> @rhadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
; CHECK-LABEL: rhadds_v8i16:		; CHECK-LABEL: rhadds_v8i16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: sunpkhi z2.s, z0.h		; CHECK-NEXT: asr z2.h, z1.h, #1
; CHECK-NEXT: sunpklo z0.s, z0.h		; CHECK-NEXT: asr z3.h, z0.h, #1
; CHECK-NEXT: mov z4.s, #-1 // =0xffffffffffffffff		; CHECK-NEXT: orr z0.d, z0.d, z1.d
; CHECK-NEXT: sunpkhi z3.s, z1.h		; CHECK-NEXT: add z2.h, z3.h, z2.h
; CHECK-NEXT: sunpklo z1.s, z1.h		; CHECK-NEXT: and z0.h, z0.h, #0x1
; CHECK-NEXT: eor z0.d, z0.d, z4.d		; CHECK-NEXT: add z0.h, z2.h, z0.h
; CHECK-NEXT: sub z0.s, z1.s, z0.s
; CHECK-NEXT: eor z1.d, z2.d, z4.d
; CHECK-NEXT: sub z1.s, z3.s, z1.s
; CHECK-NEXT: lsr z0.s, z0.s, #1
; CHECK-NEXT: lsr z1.s, z1.s, #1
; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>		%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>		%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
%add = add <vscale x 8 x i32> %s0s, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%add = add <vscale x 8 x i32> %s0s, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%add2 = add <vscale x 8 x i32> %add, %s1s		%add2 = add <vscale x 8 x i32> %add, %s1s
%s = ashr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%s = ashr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>		%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
ret <vscale x 8 x i16> %result		ret <vscale x 8 x i16> %result
}		}

define <vscale x 8 x i16> @rhaddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {		define <vscale x 8 x i16> @rhaddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
; CHECK-LABEL: rhaddu_v8i16:		; CHECK-LABEL: rhaddu_v8i16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uunpkhi z2.s, z0.h		; CHECK-NEXT: lsr z2.h, z1.h, #1
; CHECK-NEXT: uunpklo z0.s, z0.h		; CHECK-NEXT: lsr z3.h, z0.h, #1
; CHECK-NEXT: mov z4.s, #-1 // =0xffffffffffffffff		; CHECK-NEXT: orr z0.d, z0.d, z1.d
; CHECK-NEXT: uunpkhi z3.s, z1.h		; CHECK-NEXT: add z2.h, z3.h, z2.h
; CHECK-NEXT: uunpklo z1.s, z1.h		; CHECK-NEXT: and z0.h, z0.h, #0x1
; CHECK-NEXT: eor z0.d, z0.d, z4.d		; CHECK-NEXT: add z0.h, z2.h, z0.h
; CHECK-NEXT: sub z0.s, z1.s, z0.s
; CHECK-NEXT: eor z1.d, z2.d, z4.d
; CHECK-NEXT: sub z1.s, z3.s, z1.s
; CHECK-NEXT: lsr z0.s, z0.s, #1
; CHECK-NEXT: lsr z1.s, z1.s, #1
; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>		%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>		%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
%add = add nuw nsw <vscale x 8 x i32> %s0s, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%add = add nuw nsw <vscale x 8 x i32> %s0s, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%add2 = add nuw nsw <vscale x 8 x i32> %add, %s1s		%add2 = add nuw nsw <vscale x 8 x i32> %add, %s1s
%s = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%s = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>		%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	entry:
%s = lshr <vscale x 8 x i16> %add2, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)		%s = lshr <vscale x 8 x i16> %add2, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
%result = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>		%result = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
ret <vscale x 8 x i8> %result		ret <vscale x 8 x i8> %result
}		}

define <vscale x 16 x i8> @rhadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {		define <vscale x 16 x i8> @rhadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
; CHECK-LABEL: rhadds_v16i8:		; CHECK-LABEL: rhadds_v16i8:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: sunpkhi z2.h, z0.b		; CHECK-NEXT: asr z2.b, z1.b, #1
; CHECK-NEXT: sunpklo z0.h, z0.b		; CHECK-NEXT: asr z3.b, z0.b, #1
; CHECK-NEXT: mov z4.h, #-1 // =0xffffffffffffffff		; CHECK-NEXT: orr z0.d, z0.d, z1.d
; CHECK-NEXT: sunpkhi z3.h, z1.b		; CHECK-NEXT: add z2.b, z3.b, z2.b
; CHECK-NEXT: sunpklo z1.h, z1.b		; CHECK-NEXT: and z0.b, z0.b, #0x1
; CHECK-NEXT: eor z0.d, z0.d, z4.d		; CHECK-NEXT: add z0.b, z2.b, z0.b
; CHECK-NEXT: sub z0.h, z1.h, z0.h
; CHECK-NEXT: eor z1.d, z2.d, z4.d
; CHECK-NEXT: sub z1.h, z3.h, z1.h
; CHECK-NEXT: lsr z0.h, z0.h, #1
; CHECK-NEXT: lsr z1.h, z1.h, #1
; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>		%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>		%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
%add = add <vscale x 16 x i16> %s0s, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%add = add <vscale x 16 x i16> %s0s, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%add2 = add <vscale x 16 x i16> %add, %s1s		%add2 = add <vscale x 16 x i16> %add, %s1s
%s = ashr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%s = ashr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>		%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
ret <vscale x 16 x i8> %result		ret <vscale x 16 x i8> %result
}		}

define <vscale x 16 x i8> @rhaddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {		define <vscale x 16 x i8> @rhaddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
; CHECK-LABEL: rhaddu_v16i8:		; CHECK-LABEL: rhaddu_v16i8:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uunpkhi z2.h, z0.b		; CHECK-NEXT: lsr z2.b, z1.b, #1
; CHECK-NEXT: uunpklo z0.h, z0.b		; CHECK-NEXT: lsr z3.b, z0.b, #1
; CHECK-NEXT: mov z4.h, #-1 // =0xffffffffffffffff		; CHECK-NEXT: orr z0.d, z0.d, z1.d
; CHECK-NEXT: uunpkhi z3.h, z1.b		; CHECK-NEXT: add z2.b, z3.b, z2.b
; CHECK-NEXT: uunpklo z1.h, z1.b		; CHECK-NEXT: and z0.b, z0.b, #0x1
; CHECK-NEXT: eor z0.d, z0.d, z4.d		; CHECK-NEXT: add z0.b, z2.b, z0.b
; CHECK-NEXT: sub z0.h, z1.h, z0.h
; CHECK-NEXT: eor z1.d, z2.d, z4.d
; CHECK-NEXT: sub z1.h, z3.h, z1.h
; CHECK-NEXT: lsr z0.h, z0.h, #1
; CHECK-NEXT: lsr z1.h, z1.h, #1
; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>		%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>		%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
%add = add nuw nsw <vscale x 16 x i16> %s0s, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%add = add nuw nsw <vscale x 16 x i16> %s0s, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%add2 = add nuw nsw <vscale x 16 x i16> %add, %s1s		%add2 = add nuw nsw <vscale x 16 x i16> %add, %s1s
%s = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%s = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>		%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
ret <vscale x 16 x i8> %result		ret <vscale x 16 x i8> %result
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE]: custom lower AVGFloor/AVGCeil.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 499568

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll

[AArch64][SVE]: custom lower AVGFloor/AVGCeil.
ClosedPublic