This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
26/33
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
5/10
sve-hadd.ll
-
sve2-hadd.ll

Differential D143283

[AArch64][SVE]: custom lower AVGFloor/AVGCeil.
ClosedPublic

Authored by hassnaa-arm on Feb 3 2023, 9:57 AM.

Download Raw Diff

Details

Reviewers

david-arm
sdesmalen
efriedma
dmgreen

Commits

rG40a51e1afce9: [AArch64][SVE]: custom lower AVGFloor/AVGCeil.

Summary

-Lower AVGFloor(A, B) to:

SRL(A) + SRL(B) + (A&B)&1.

-Lower AVGCeil(A, B) to:

SRL(A) + SRL(B) + (A|B)&1.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > MLIR.Examples/standalone::test.toy
	700 ms	x64 debian > MemProfiler-x86_64-linux-dynamic.TestCases::memprof_inline.c
	60,050 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp

Event Timeline

hassnaa-arm created this revision.Feb 3 2023, 9:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 9:57 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

hassnaa-arm requested review of this revision.Feb 3 2023, 9:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 9:57 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B211756: Diff 494676.Feb 3 2023, 11:05 AM

This patch doesn't seem like it is in a state to review yet (e.g. it doesn't have any tests), perhaps you can add 'WIP' to the title to make it clear that this is a Work In Progress?

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1420–1434 ↗	(On Diff #494676)	Just as a suggestion for future uses of these pattern matches, is that you could write this in a more compact way like this: if (match(Op1, m_SpecificInt(1)) && match(Op2, m_Add(m_Add(m_OneUse(m_Value(A)), m_OneUse(m_Value(B))), m_ConstantInt(C))) && (C->getZExtValue() == 0 \|\| C->getZExtValue() == 1)) { ... }
1456 ↗	(On Diff #494676)	This transform is only profitable when the target has no explicit instructions for this pattern (e.g. like Arm's urhadd/srhadd) and the extends are expensive. The InstCombine pass transforms the IR to a canonical form and doesn't use the CostModel to decide whether a certain form is profitable or not. A better place to do this would be in an DAGCombine (to start it's good enough for this to be target-specific, e.g. a combine in AArch64ISelLowering.cpp) where we can simply check if the target being compiled for has the urhadd/shradd instructions, and if not, do the transform.

This revision now requires changes to proceed.Feb 6 2023, 12:58 AM

hassnaa-arm retitled this revision from [Transform][InstCombine]: transform lshr pattern. to [Transform][InstCombine]: transform lshr pattern. [WIP].Feb 6 2023, 2:01 AM

Matt added a subscriber: Matt.Feb 6 2023, 12:16 PM

Move combining trunc shift and extend shift to AArch64

hassnaa-arm retitled this revision from [Transform][InstCombine]: transform lshr pattern. [WIP] to [AArch64][combine]: transform lshr pattern. [WIP].Feb 7 2023, 10:28 AM

hassnaa-arm edited the summary of this revision. (Show Details)

Herald added a subscriber: kristof.beyls. · View Herald TranscriptFeb 7 2023, 10:28 AM

hassnaa-arm retitled this revision from [AArch64][combine]: transform lshr pattern. [WIP] to [AArch64][combine]: combine lshr pattern. [WIP].Feb 7 2023, 10:28 AM

hassnaa-arm added inline comments.Feb 7 2023, 10:36 AM

llvm/test/CodeGen/AArch64/neon-lshr.ll

22 ↗

(On Diff #495593)

I investigated the generated DAG and the effect of my changes,
I found that my changes affected the DAG.
Here is the DAG just before the step of instruction selection:

SelectionDAG has 19 nodes:
  t0: ch,glue = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t4: i32,ch = CopyFromReg t0, Register:i32 %1
          t50: i32 = and t2, Constant:i32<254>
        t37: i32 = srl t50, Constant:i64<1>
          t52: i32 = and t4, Constant:i32<254>
        t42: i32 = srl t52, Constant:i64<1>
      t43: i32 = add t37, t42
        t39: i32 = or t2, t4
      t40: i32 = and t39, Constant:i32<1>
    t44: i32 = add t43, t40
  t17: ch,glue = CopyToReg t0, Register:i32 $w0, t44
  t18: ch = AArch64ISD::RET_FLAG t17, Register:i32 $w0, t17:1

Harbormaster completed remote builds in B212439: Diff 495593.Feb 7 2023, 12:04 PM

sdesmalen added inline comments.Feb 8 2023, 3:00 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17699	As I understand it, the point to simplify the following: trunc(lshr(add(add(ext(a), ext(b)), 1), 1)) -> lshr(a, 1) + lshr(b, 1) + (a \| b) & 1 iff the type of ext(a) is not a legal type
17701	I see that in llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp it tries to combine this pattern into a simpler `AVGFLOORS/AVGFLOORU/AVGCEILS/AVGCEILU` node in the function `combineShiftToAVG`. Is there a way to re-use this existing mechanism and then Custom lower this node to the desired set of instructions? I even wonder if we could generalise this code (i.e. not specific to AArch64), when the nodes are set to Expand.
17701	You're testing for SVE2 (vector extension), but the test is using scalar types, not vector types. That doesn't seem entirely right?
18683	I don't believe this is a transform we want to make.

Add AArch64 implementation for custom-lowering AVGFloor/AVGCeil

hassnaa-arm retitled this revision from [AArch64][combine]: combine lshr pattern. [WIP] to [AArch64][SVE]: custom lower AVGFloor/AVGCeil. [WIP].Feb 10 2023, 4:02 AM

hassnaa-arm edited the summary of this revision. (Show Details)

Herald added a reviewer: efriedma. · View Herald TranscriptFeb 10 2023, 4:02 AM

Herald added subscribers: psnobl, tschuett. · View Herald Transcript

Remove neon-lshr.ll

Harbormaster completed remote builds in B213012: Diff 496419.Feb 10 2023, 4:59 AM

Remove old code that is not used now.

Please pre-commit the new testcases, so the changes are more visible.

The <vscale x 2 x i128> variations currently crash the compiler; I'd recommend fixing that before trying to optimize them.

The "expected" code for haddu_v2i32 is worse than what we currently generate.

Add precursory patch.

Harbormaster completed remote builds in B214097: Diff 497932.Feb 16 2023, 1:57 AM

Optimize the generated code by checking if the extended node was previously truncated.

Herald added a subscriber: dmgreen. · View Herald TranscriptFeb 16 2023, 9:02 AM

Harbormaster completed remote builds in B214175: Diff 498046.Feb 16 2023, 9:03 AM

Optimize the generated code by checking if the extended nodes were previously truncated.

Harbormaster completed remote builds in B214180: Diff 498050.Feb 16 2023, 9:26 AM

hassnaa-arm retitled this revision from [AArch64][SVE]: custom lower AVGFloor/AVGCeil. [WIP] to [AArch64][SVE]: custom lower AVGFloor/AVGCeil..Feb 20 2023, 2:04 AM

sdesmalen added inline comments.Feb 20 2023, 3:41 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1034 ↗	(On Diff #498046)	This is probably better added as a separate DAGCombine that folds away the sign-extend-in-regs explicitly, because that is already performed by the avgfloor operation. avgfloors(sextinreg(x), sextinreg(y)) -> avgfloors(x, y)
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13392	You seem to be trying to optimise the case where one of the operands is a vector of ones, but I don't see any motivating tests for this? I would also expect these cases to be folded away already by existing DAGCombines.
llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
62 ↗	(On Diff #498050)	This doesn't look like an improvement, we don't really want to do this transform if it makes the resulting code worse. Do you know why this results in worse code?

hassnaa-arm added inline comments.Feb 20 2023, 4:37 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1034 ↗	(On Diff #498046)	Sorry, I don't understand this comment, may you clarify if more ? Why did you mention sign-extend-in-regs ? why is it related to this code ?

sdesmalen added inline comments.Feb 20 2023, 4:52 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1034 ↗	(On Diff #498046)	The truncate + sign-extend will be combined into a sign-extend-in-reg. You can see that if you disable the code you added below on lines 1046-1055, and run the test `@hadd8_sext_asr`

hassnaa-arm added inline comments.Feb 20 2023, 5:11 AM

llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
62 ↗	(On Diff #498050)	In the new changes, the generated instructions are exactly the equivalent for AVGFloor, no additional instructions. I think nothing can be done for this case.

sdesmalen added inline comments.Feb 20 2023, 8:45 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1034 ↗	(On Diff #498046)	Given my other suggestion, please ignore my suggestion above to add a DAGCombine (which I'm not even sure is legal; the semantics of the AVGFLOOR operation aren't entirely clear to me still)
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13385	Should this be using an arithmetic shift when the operation is signed?
llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
62 ↗	(On Diff #498050)	For these cases (where the input is an unpacked vector that is promoted using an `and` (for the unsigned case) or `sign_extend_inreg` (for the signed case)) we can emit the original code when Custom lowering these nodes. That way we can mitigate the regression.
169 ↗	(On Diff #498050)	Should this use `ashr` (arithmetic shift, since the values s0s/s1s are signed)? Same question for other tests in this file.

Check if it's better to emit the original code or custom lower AVGFloor/Ceil

Harbormaster completed remote builds in B214987: Diff 499126.Feb 21 2023, 5:33 AM

Change lshr to ashr for signed cases in the precursory patch.

Harbormaster completed remote builds in B214995: Diff 499135.Feb 21 2023, 5:53 AM

In case of CEIL, Put ADD operation for constant 1.

Harbormaster completed remote builds in B215306: Diff 499568.Feb 22 2023, 10:06 AM

sdesmalen added inline comments.Feb 23 2023, 1:57 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13390	This is missing a check to ensure that the `and` value is a mask, and that the masked 'type' is smaller than the size of the current type. Perhaps you can get some inspiration from `isExtendOrShiftOperand` which checks whether the operation is zero or sign-extended. In either case, it would be nice to have this logic in separate `isSignedExtended()` and `isZeroExtended()` functions.
13393	There are several uses of Node->getOpcode(), you can move that to a separate variable. It would also be nice to have some variables: IsSigned (for AVGFLOORS and AVGCEILS) IsFloor (for AVGFLOORS and AVGFLOORU) ShiftOpc = IsSigned ? ISD::SRA : ISD::SRL; That way you can simplify the code a bit, without having to check the opcodes everywhere, and you can combine some of the if/else branches.
llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
4 ↗	(On Diff #499568)	Can you add these tests to the precursory patch, so that this patch only shows the diff?
1 ↗	(On Diff #499126)	The filename has both a dash (`-`) and underscores (`_`). Can you rename it to sve-avg-floor-ceil.ll ?
2 ↗	(On Diff #499126)	Can you add two RUN lines, one for -mattr=+sve and one for -mattr=+sve2 ?

hassnaa-arm marked 4 inline comments as done.Feb 23 2023, 8:41 AM

hassnaa-arm added inline comments.

llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
4 ↗	(On Diff #499568)	Adding them cause a crash. Crash message is: "LLVM ERROR: Scalarization of scalable vectors is not supported." It crashes while legalising the result type of nxv2i128

Enhance code readability.

Hello. Sorry for the delay in looking at this but I wasn't sure exactly what you were trying to do, and I've never been a huge fan of DAG combines that create the wrong node just to expand it later. It looks like for legal types this can lead to a nice decrease in instruction count though.

For smaller types I'm not sure that checking for individual opcodes for extension will work well. They could already be extending loads for example. I've not thought about it too much yet, but As far as I understand from the original hadd (/avg) work it would probably be better to be checking that the top bits are known 0 for unsigned, and that there are >1 sign bits for the signed cases.

In any case, it would be good to see alive proofs for the transforms you are making.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6108	This can be moved into the start of LowerAVG to make this function a little more regular. (I know it's not super regular already, but other parts already do the same thing).
13379	Can it use countTrailingOnes as opposed to countPopulation? Although I think these functions might be better removed and use computeKnownBits checks instead.
13393	This can just be called LowerAVG
13415	check should be Check
13416	Why is it OK to only check one of the operands?
13425	Doesn't need the else after the return, which might help simplify if this is checking the sign bits.
llvm/test/CodeGen/AArch64/sve-avg-floor-ceil.ll
3 ↗	(On Diff #499878)	There is already a test for hadds in sve in sve2-hadd. This test looks like a copy of it. It would probably be better to just use the existing test with an extra run line for SVE1 targets.
llvm/test/CodeGen/AArch64/sve-avg_floor_ceil.ll
4 ↗	(On Diff #499568)	Like Eli said it would be good to fix vscale x 1 vectorization so this wasn't a problem any more.

Remove sve-avgfloor testing file.
Add RUN line for sve to sve2-hadd
rename sve2-hadd to sve-hadd

hassnaa-arm added inline comments.Feb 28 2023, 4:53 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13379	I think using countTrailingOnes is much simpler than computeKnownBits because the operand and its value are already known.

sdesmalen added inline comments.Feb 28 2023, 5:46 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13379	I presume that @dmgreen meant using `computeKnownBits` on `Node` as a whole, not specifically on the splatted value so that you don't need to check for the opcode explicitly. See for example `checkZExtBool`, where it does: APInt RequredZero(SizeInBits, 0xFE); KnownBits Bits = DAG.computeKnownBits(Arg, 4); bool ZExtBool = (Bits.Zero & RequredZero) == RequredZero; You can probably do a similar thing for `IsSignExtended`, but then looking at the `Bits.One` instead of `Bits.Zero`.

Harbormaster completed remote builds in B216441: Diff 501093.Feb 28 2023, 6:01 AM

Use computeKnownBits for checking zeroExtedn/signExtend.

Harbormaster completed remote builds in B216498: Diff 501178.Feb 28 2023, 10:40 AM

Use ComputeNumSignBits instead of ComputeKnownBits for SIGN_EXTEND_INREG ops.

hassnaa-arm added inline comments.Mar 1 2023, 7:42 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13390	I used DAG.ComputeNumSignBits instead of computeKnownBits, because computeKnownBits doesn't get any known bits for SIGN_EXTEND_INREG
llvm/test/CodeGen/AArch64/sve-hadd.ll
104	@dmgreen I tried to get alive proofs for this transform, https://alive2.llvm.org/ce/ but it seems there is something wrong. I'm not sure if the problem related to my equivalent IR to the generated code or the problem is because the transform is not correct.

Hi - I was just looking at the patch whilst you updated it! Please ignore any comments that don't apply any more.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13375	It might be better to make these lambdas inside LowerAVG. The names sound fairly generic but they are in practice tied to hadd nodes.
13376	Node.getValueType()...
13378	I believe this only needs 1 bit to be zero, so it could probably use Known.Zero.isSignBitSet()? There is a proof in https://alive2.llvm.org/ce/z/cEtdJa for rhadd.
13385	I think this one would be better to use computeNumSignBits. It's less important whether they are 0 or 1, just that they are the same as the top bit. Again I think it needs 1 bit to be valid, so computeNumSignBits(..) > 1 should do it. https://alive2.llvm.org/ce/z/VbxRs-. It would be good to see proofs for the other bit here too where the hadd is expanded to `SRL(A) + SRL(B) + (A&B)&1`
13403–13409	I would probably do: unsigned Opc = Op->getOpcode(); bool IsCeil = Opc == ISD::AVGCEILS \|\| Opc == ISD::AVGCEILU; bool IsSigned = Opc == ISD::AVGFLOORS \|\| Op->getOpcode() == ISD::AVGCEILS;
13416	&&, not \|\|, I would expect. A HADD is equivalent to `trunc(shr(add(ext(x),ext(y)), 1))`, it is not directly equivalent to `shr(add(x,y), 1)`. So we need to prove that turning it into the simpler version add+shift is better. It's not really "emit the original code".
llvm/test/CodeGen/AArch64/sve-hadd.ll
75	This ideally shouldn't change. Because the top bit isn't demanded instcombine will transform ashr into lshr, and we should be testing what the backend will see: https://godbolt.org/z/bvof8j7cc. I guess the lshr version isn't transformed after it gets promoted? It might be OK in this case.
104	Can you update the link? There are some other alive links that could help.

Harbormaster completed remote builds in B216718: Diff 501511.Mar 1 2023, 8:54 AM

hassnaa-arm marked 5 inline comments as done.Mar 1 2023, 12:13 PM

hassnaa-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13378	Is that because the sign bit will be known to be zero ONLY for zero-extend case ?
llvm/test/CodeGen/AArch64/sve-hadd.ll
75	But for some cases, using lshr in the IR cause generating extra AND instruction.

Check if both operands of AVG are extended, not just single one.

Harbormaster completed remote builds in B216806: Diff 501628.Mar 1 2023, 1:58 PM

dmgreen added inline comments.Mar 2 2023, 12:21 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13378	Oh right. I was assuming that it would use `if (!IsSigned && IsZeroExtended(OpA) && IsZeroExtended(OpB))` and `if (IsSigned && IsSignExtended(OpA) && IsSignExtended(OpB))` An hadd / rhadd is defined as converting into a arbitrary wide integer before doing the add/shift and then converting back. It just happens that it only needs 1 bit extra for that to be equivalent to any other type sizes. So we only need to check the top bit is known to be 0. (isSignBitSet doesnt really have anything to do with signedness here, its just a way of checking the top bit).

hassnaa-arm marked an inline comment as done.Mar 2 2023, 6:59 AM

While checking isZeroExtending, only checking the signbit of known Zeros is enough.

sdesmalen added inline comments.Mar 2 2023, 7:34 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13384	Move this closer to use.
13387	nit: unnecessary newline. Also, these lines exceed the 80char limit, so please run clang-format on this code.
13400	You can remove this condition, it's covered by DAG.ComputeNumSignBits.
13406–13422	You can combine these cases doing something like this: if ((IsSigned && IsSignExtended(OpA) && IsSignExtended(OpB)) \|\| (!IsSigned && IsZeroExtended(OpA) && IsZeroExtended(OpB))) { ... unsigned ShiftOpc = IsSigned ? ISD::SRA : ISD::SRL; return DAG.getNode(ShiftOpc, dl, VT, Add, ConstantOne); }
13428–13432	nit: SDValue Tmp = DAG.getNode(IsCeil ? ISD::OR : ISD::AND, dl, VT, OpA, OpB);

Harbormaster completed remote builds in B216971: Diff 501857.Mar 2 2023, 8:05 AM

Enhance code readability.

Harbormaster completed remote builds in B217139: Diff 502090.Mar 3 2023, 4:29 AM

Thanks. here are some alive proofs for the transform in https://alive2.llvm.org/ce/z/N6hwQY and https://alive2.llvm.org/ce/z/u_GjYJ.

Can you extend the testing to include both ashr and lshr versions? They should both be useful if we are custom legalizing the nodes. Otherwise I think this looks good.

llvm/test/CodeGen/AArch64/sve-hadd.ll
25	Can you copy these tests so there are versions with both lshr and ashr.

Add test cases for logical shr.

hassnaa-arm added inline comments.Mar 7 2023, 2:17 AM

llvm/test/CodeGen/AArch64/sve-hadd.ll
904	here is a transformation proof for this case: https://alive2.llvm.org/ce/z/vPJi6R @dmgreen

I think it's worth adding test for both the ashr and lshr versions, but otherwise I think this LGTM. Thanks

llvm/test/CodeGen/AArch64/sve-hadd.ll
166	Is there a lshr version of this one? It would be good to have some that are "full width" and use lshr. As instcombine will convert all the ashr to lshr it might be best to make sure there are tests for all the functions that were changed.
904	I think that's another case. This one would be https://alive2.llvm.org/ce/z/tp5NmX. From what I can tell they all look OK, according to alive.

Harbormaster completed remote builds in B217820: Diff 502962.Mar 7 2023, 3:00 AM

Thanks for all the changes @hassnaa-arm, I've just left some final minor comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

13375

Can you add a comment explaining what this does, e.g. something like

When x and y are extended, lower:
  avgfloor(x, y) -> (x + y) >> 1
  avgceil(x, y)  -> (x + y + 1) >> 1

Otherwise, lower to:
  avgfloor(x, y) -> (x >> 1) + (y >> 1) + (x && y && 1)
  avgceil(x, y)  -> (x >> 1) + (y >> 1) + ((x || y) && 1)

13404–13405

nit: this comment would be redundant if you address my other comment (to add a comment for the function itself)

13412–13414

nit:

SDValue ShiftOpA = DAG.getNode(ShiftOpc, dl, VT, OpA, ConstantOne);
SDValue ShiftOpB = DAG.getNode(ShiftOpc, dl, VT, OpB, ConstantOne);

hassnaa-arm marked 3 inline comments as done.Mar 7 2023, 4:42 AM

hassnaa-arm added inline comments.

llvm/test/CodeGen/AArch64/sve-hadd.ll
166	Sorry, I don't understand what you mean by "full width" ?

Add comments explaining what LowerAvg() does.

dmgreen added inline comments.Mar 7 2023, 5:10 AM

llvm/test/CodeGen/AArch64/sve-hadd.ll
166	I just meant a multiple of 128 - a <vscale x 4 x i32>. There appear to be `<vscale x 2 x ..>` tests for lshr, but we should have all the others sizes too.

Harbormaster completed remote builds in B217843: Diff 502991.Mar 7 2023, 5:47 AM

hassnaa-arm marked an inline comment as done.Mar 8 2023, 1:38 PM

Add test cases that use lshr.

Harbormaster completed remote builds in B218197: Diff 503500.Mar 8 2023, 2:34 PM

@dmgreen Thanks for reviewing the patch. Do you have any further comments ?

Thanks for the changes @hassnaa-arm, I'm satisfied with the patch now so removing my 'requesting changes'.
Unless @dmgreen has more comments on the tests, I'm happy for this patch to land.

This revision is now accepted and ready to land.Mar 9 2023, 6:14 AM

Yeah nothing else from me. LGTM, thanks for the changes.

This revision was landed with ongoing or failed builds.Mar 13 2023, 12:01 PM

Closed by commit rG40a51e1afce9: [AArch64][SVE]: custom lower AVGFloor/AVGCeil. (authored by Hassnaa Hamdi <hassnaa.hamdi@arm.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

Hassnaa Hamdi <hassnaa.hamdi@arm.com> added a commit: rG40a51e1afce9: [AArch64][SVE]: custom lower AVGFloor/AVGCeil..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

69 lines

test/

CodeGen/

AArch64/

sve-hadd.ll

1295 lines

sve2-hadd.ll

Diff 503500

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 1,085 Lines • ▼ Show 20 Lines	private:
SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECREDUCE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_LOAD_SUB(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_LOAD_SUB(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_LOAD_AND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_LOAD_AND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerWindowsDYNAMIC_STACKALLOC(SDValue Op, SDValue Chain,		SDValue LowerWindowsDYNAMIC_STACKALLOC(SDValue Op, SDValue Chain,
SDValue &Size,		SDValue &Size,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
		SDValue LowerAVG(SDValue Op, SelectionDAG &DAG, unsigned NewOp) const;

SDValue LowerFixedLengthVectorIntDivideToSVE(SDValue Op,		SDValue LowerFixedLengthVectorIntDivideToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorIntExtendToSVE(SDValue Op,		SDValue LowerFixedLengthVectorIntExtendToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorLoadToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthVectorLoadToSVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorMLoadToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthVectorMLoadToSVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECREDUCE_SEQ_FADD(SDValue ScalarOp, SelectionDAG &DAG) const;		SDValue LowerVECREDUCE_SEQ_FADD(SDValue ScalarOp, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,296 Lines • ▼ Show 20 Lines	for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
setOperationAction(ISD::UADDSAT, VT, Legal);		setOperationAction(ISD::UADDSAT, VT, Legal);
setOperationAction(ISD::SSUBSAT, VT, Legal);		setOperationAction(ISD::SSUBSAT, VT, Legal);
setOperationAction(ISD::USUBSAT, VT, Legal);		setOperationAction(ISD::USUBSAT, VT, Legal);
setOperationAction(ISD::UREM, VT, Expand);		setOperationAction(ISD::UREM, VT, Expand);
setOperationAction(ISD::SREM, VT, Expand);		setOperationAction(ISD::SREM, VT, Expand);
setOperationAction(ISD::SDIVREM, VT, Expand);		setOperationAction(ISD::SDIVREM, VT, Expand);
setOperationAction(ISD::UDIVREM, VT, Expand);		setOperationAction(ISD::UDIVREM, VT, Expand);

if (Subtarget->hasSVE2()) {
setOperationAction(ISD::AVGFLOORS, VT, Custom);		setOperationAction(ISD::AVGFLOORS, VT, Custom);
setOperationAction(ISD::AVGFLOORU, VT, Custom);		setOperationAction(ISD::AVGFLOORU, VT, Custom);
setOperationAction(ISD::AVGCEILS, VT, Custom);		setOperationAction(ISD::AVGCEILS, VT, Custom);
setOperationAction(ISD::AVGCEILU, VT, Custom);		setOperationAction(ISD::AVGCEILU, VT, Custom);
}		}
}

// Illegal unpacked integer vector types.		// Illegal unpacked integer vector types.
for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {		for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
}		}

// Legalize unpacked bitcasts to REINTERPRET_CAST.		// Legalize unpacked bitcasts to REINTERPRET_CAST.
▲ Show 20 Lines • Show All 4,782 Lines • ▼ Show 20 Lines	case ISD::VSELECT:
return LowerFixedLengthVectorSelectToSVE(Op, DAG);		return LowerFixedLengthVectorSelectToSVE(Op, DAG);
case ISD::ABS:		case ISD::ABS:
return LowerABS(Op, DAG);		return LowerABS(Op, DAG);
case ISD::ABDS:		case ISD::ABDS:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDS_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDS_PRED);
case ISD::ABDU:		case ISD::ABDU:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDU_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDU_PRED);
case ISD::AVGFLOORS:		case ISD::AVGFLOORS:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::HADDS_PRED);		return LowerAVG(Op, DAG, AArch64ISD::HADDS_PRED);
		dmgreenUnsubmitted Done Reply Inline Actions This can be moved into the start of LowerAVG to make this function a little more regular. (I know it's not super regular already, but other parts already do the same thing). dmgreen: This can be moved into the start of LowerAVG to make this function a little more regular. (I…
case ISD::AVGFLOORU:		case ISD::AVGFLOORU:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::HADDU_PRED);		return LowerAVG(Op, DAG, AArch64ISD::HADDU_PRED);
case ISD::AVGCEILS:		case ISD::AVGCEILS:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::RHADDS_PRED);		return LowerAVG(Op, DAG, AArch64ISD::RHADDS_PRED);
case ISD::AVGCEILU:		case ISD::AVGCEILU:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::RHADDU_PRED);		return LowerAVG(Op, DAG, AArch64ISD::RHADDU_PRED);
case ISD::BITREVERSE:		case ISD::BITREVERSE:
return LowerBitreverse(Op, DAG);		return LowerBitreverse(Op, DAG);
case ISD::BSWAP:		case ISD::BSWAP:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);
case ISD::CTLZ:		case ISD::CTLZ:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);
case ISD::CTTZ:		case ISD::CTTZ:
return LowerCTTZ(Op, DAG);		return LowerCTTZ(Op, DAG);
▲ Show 20 Lines • Show All 7,244 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerWindowsDYNAMIC_STACKALLOC(
// from X15 here doesn't work at -O0, since it thinks that X15 is undefined		// from X15 here doesn't work at -O0, since it thinks that X15 is undefined
// here.		// here.

Size = DAG.getNode(ISD::SHL, dl, MVT::i64, Size,		Size = DAG.getNode(ISD::SHL, dl, MVT::i64, Size,
DAG.getConstant(4, dl, MVT::i64));		DAG.getConstant(4, dl, MVT::i64));
return Chain;		return Chain;
}		}

		// When x and y are extended, lower:
		dmgreenUnsubmitted Done Reply Inline Actions It might be better to make these lambdas inside LowerAVG. The names sound fairly generic but they are in practice tied to hadd nodes. dmgreen: It might be better to make these lambdas inside LowerAVG. The names sound fairly generic but…
		sdesmalenUnsubmitted Done Reply Inline Actions Can you add a comment explaining what this does, e.g. something like When x and y are extended, lower: avgfloor(x, y) -> (x + y) >> 1 avgceil(x, y) -> (x + y + 1) >> 1 Otherwise, lower to: avgfloor(x, y) -> (x >> 1) + (y >> 1) + (x && y && 1) avgceil(x, y) -> (x >> 1) + (y >> 1) + ((x \|\| y) && 1) sdesmalen: Can you add a comment explaining what this does, e.g. something like When x and y are…
		// avgfloor(x, y) -> (x + y) >> 1
		dmgreenUnsubmitted Done Reply Inline Actions Node.getValueType()... dmgreen: Node.getValueType()...
		// avgceil(x, y) -> (x + y + 1) >> 1

		dmgreenUnsubmitted Not Done Reply Inline Actions I believe this only needs 1 bit to be zero, so it could probably use Known.Zero.isSignBitSet()? There is a proof in https://alive2.llvm.org/ce/z/cEtdJa for rhadd. dmgreen: I believe this only needs 1 bit to be zero, so it could probably use Known.Zero.isSignBitSet()?
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions Is that because the sign bit will be known to be zero ONLY for zero-extend case ? hassnaa-arm: Is that because the sign bit will be known to be zero ONLY for zero-extend case ?
		dmgreenUnsubmitted Done Reply Inline Actions Oh right. I was assuming that it would use `if (!IsSigned && IsZeroExtended(OpA) && IsZeroExtended(OpB))` and `if (IsSigned && IsSignExtended(OpA) && IsSignExtended(OpB))` An hadd / rhadd is defined as converting into a arbitrary wide integer before doing the add/shift and then converting back. It just happens that it only needs 1 bit extra for that to be equivalent to any other type sizes. So we only need to check the top bit is known to be 0. (isSignBitSet doesnt really have anything to do with signedness here, its just a way of checking the top bit). dmgreen: Oh right. I was assuming that it would use `if (!IsSigned && IsZeroExtended(OpA) &&…
		// Otherwise, lower to:
		dmgreenUnsubmitted Done Reply Inline Actions Can it use countTrailingOnes as opposed to countPopulation? Although I think these functions might be better removed and use computeKnownBits checks instead. dmgreen: Can it use countTrailingOnes as opposed to countPopulation? Although I think these functions…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions I think using countTrailingOnes is much simpler than computeKnownBits because the operand and its value are already known. hassnaa-arm: I think using countTrailingOnes is much simpler than computeKnownBits because the operand and…
		sdesmalenUnsubmitted Not Done Reply Inline Actions I presume that @dmgreen meant using `computeKnownBits` on `Node` as a whole, not specifically on the splatted value so that you don't need to check for the opcode explicitly. See for example `checkZExtBool`, where it does: APInt RequredZero(SizeInBits, 0xFE); KnownBits Bits = DAG.computeKnownBits(Arg, 4); bool ZExtBool = (Bits.Zero & RequredZero) == RequredZero; You can probably do a similar thing for `IsSignExtended`, but then looking at the `Bits.One` instead of `Bits.Zero`. sdesmalen: I presume that @dmgreen meant using `computeKnownBits` on `Node` as a whole, not specifically…
		// avgfloor(x, y) -> (x >> 1) + (y >> 1) + (x & y & 1)
		// avgceil(x, y) -> (x >> 1) + (y >> 1) + ((x \|\| y) & 1)
		SDValue AArch64TargetLowering::LowerAVG(SDValue Op, SelectionDAG &DAG,
		unsigned NewOp) const {
		if (Subtarget->hasSVE2())
		sdesmalenUnsubmitted Done Reply Inline Actions Move this closer to use. sdesmalen: Move this closer to use.
		return LowerToPredicatedOp(Op, DAG, NewOp);
		sdesmalenUnsubmitted Done Reply Inline Actions Should this be using an arithmetic shift when the operation is signed? sdesmalen: Should this be using an arithmetic shift when the operation is signed?
		dmgreenUnsubmitted Done Reply Inline Actions I think this one would be better to use computeNumSignBits. It's less important whether they are 0 or 1, just that they are the same as the top bit. Again I think it needs 1 bit to be valid, so computeNumSignBits(..) > 1 should do it. https://alive2.llvm.org/ce/z/VbxRs-. It would be good to see proofs for the other bit here too where the hadd is expanded to `SRL(A) + SRL(B) + (A&B)&1` dmgreen: I think this one would be better to use computeNumSignBits. It's less important whether they…

		SDLoc dl(Op);
		sdesmalenUnsubmitted Done Reply Inline Actions nit: unnecessary newline. Also, these lines exceed the 80char limit, so please run clang-format on this code. sdesmalen: nit: unnecessary newline. Also, these lines exceed the 80char limit, so please run clang…
		SDValue OpA = Op->getOperand(0);
		SDValue OpB = Op->getOperand(1);
		EVT VT = Op.getValueType();
		sdesmalenUnsubmitted Done Reply Inline Actions This is missing a check to ensure that the `and` value is a mask, and that the masked 'type' is smaller than the size of the current type. Perhaps you can get some inspiration from `isExtendOrShiftOperand` which checks whether the operation is zero or sign-extended. In either case, it would be nice to have this logic in separate `isSignedExtended()` and `isZeroExtended()` functions. sdesmalen: This is missing a check to ensure that the `and` value is a mask, and that the masked 'type' is…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions I used DAG.ComputeNumSignBits instead of computeKnownBits, because computeKnownBits doesn't get any known bits for SIGN_EXTEND_INREG hassnaa-arm: I used DAG.ComputeNumSignBits instead of computeKnownBits, because computeKnownBits doesn't get…
		bool IsCeil =
		(Op->getOpcode() == ISD::AVGCEILS \|\| Op->getOpcode() == ISD::AVGCEILU);
		sdesmalenUnsubmitted Done Reply Inline Actions You seem to be trying to optimise the case where one of the operands is a vector of ones, but I don't see any motivating tests for this? I would also expect these cases to be folded away already by existing DAGCombines. sdesmalen: You seem to be trying to optimise the case where one of the operands is a vector of ones, but I…
		bool IsSigned =
		sdesmalenUnsubmitted Done Reply Inline Actions There are several uses of Node->getOpcode(), you can move that to a separate variable. It would also be nice to have some variables: IsSigned (for AVGFLOORS and AVGCEILS) IsFloor (for AVGFLOORS and AVGFLOORU) ShiftOpc = IsSigned ? ISD::SRA : ISD::SRL; That way you can simplify the code a bit, without having to check the opcodes everywhere, and you can combine some of the if/else branches. sdesmalen: There are several uses of Node->getOpcode(), you can move that to a separate variable. It…
		dmgreenUnsubmitted Done Reply Inline Actions This can just be called LowerAVG dmgreen: This can just be called LowerAVG
		(Op->getOpcode() == ISD::AVGFLOORS \|\| Op->getOpcode() == ISD::AVGCEILS);
		unsigned ShiftOpc = IsSigned ? ISD::SRA : ISD::SRL;

		assert(VT.isScalableVector() && "Only expect to lower scalable vector op!");

		auto IsZeroExtended = [&DAG](SDValue &Node) {
		KnownBits Known = DAG.computeKnownBits(Node, 0);
		sdesmalenUnsubmitted Done Reply Inline Actions You can remove this condition, it's covered by DAG.ComputeNumSignBits. sdesmalen: You can remove this condition, it's covered by DAG.ComputeNumSignBits.
		return Known.Zero.isSignBitSet();
		};

		auto IsSignExtended = [&DAG](SDValue &Node) {
		return (DAG.ComputeNumSignBits(Node, 0) > 1);
		sdesmalenUnsubmitted Done Reply Inline Actions nit: this comment would be redundant if you address my other comment (to add a comment for the function itself) sdesmalen: nit: this comment would be redundant if you address my other comment (to add a comment for the…
		};

		SDValue ConstantOne = DAG.getConstant(1, dl, VT);
		if ((!IsSigned && IsZeroExtended(OpA) && IsZeroExtended(OpB)) \|\|
		dmgreenUnsubmitted Done Reply Inline Actions I would probably do: unsigned Opc = Op->getOpcode(); bool IsCeil = Opc == ISD::AVGCEILS \|\| Opc == ISD::AVGCEILU; bool IsSigned = Opc == ISD::AVGFLOORS \|\| Op->getOpcode() == ISD::AVGCEILS; dmgreen: I would probably do: ``` unsigned Opc = Op->getOpcode(); bool IsCeil = Opc == ISD::AVGCEILS \|\|…
		(IsSigned && IsSignExtended(OpA) && IsSignExtended(OpB))) {
		SDValue Add = DAG.getNode(ISD::ADD, dl, VT, OpA, OpB);
		if (IsCeil)
		Add = DAG.getNode(ISD::ADD, dl, VT, Add, ConstantOne);
		return DAG.getNode(ShiftOpc, dl, VT, Add, ConstantOne);
		sdesmalenUnsubmitted Done Reply Inline Actions nit: SDValue ShiftOpA = DAG.getNode(ShiftOpc, dl, VT, OpA, ConstantOne); SDValue ShiftOpB = DAG.getNode(ShiftOpc, dl, VT, OpB, ConstantOne); sdesmalen: nit: SDValue ShiftOpA = DAG.getNode(ShiftOpc, dl, VT, OpA, ConstantOne); SDValue ShiftOpB…
		}
		dmgreenUnsubmitted Done Reply Inline Actions check should be Check dmgreen: check should be Check

		dmgreenUnsubmitted Not Done Reply Inline Actions Why is it OK to only check one of the operands? dmgreen: Why is it OK to only check one of the operands?
		dmgreenUnsubmitted Done Reply Inline Actions &&, not \|\|, I would expect. A HADD is equivalent to `trunc(shr(add(ext(x),ext(y)), 1))`, it is not directly equivalent to `shr(add(x,y), 1)`. So we need to prove that turning it into the simpler version add+shift is better. It's not really "emit the original code". dmgreen: &&, not \|\|, I would expect. A HADD is equivalent to `trunc(shr(add(ext(x),ext(y)), 1))`, it is…
		SDValue ShiftOpA = DAG.getNode(ShiftOpc, dl, VT, OpA, ConstantOne);
		SDValue ShiftOpB = DAG.getNode(ShiftOpc, dl, VT, OpB, ConstantOne);

		SDValue tmp = DAG.getNode(IsCeil ? ISD::OR : ISD::AND, dl, VT, OpA, OpB);
		tmp = DAG.getNode(ISD::AND, dl, VT, tmp, ConstantOne);
		SDValue Add = DAG.getNode(ISD::ADD, dl, VT, ShiftOpA, ShiftOpB);
		sdesmalenUnsubmitted Done Reply Inline Actions You can combine these cases doing something like this: if ((IsSigned && IsSignExtended(OpA) && IsSignExtended(OpB)) \|\| (!IsSigned && IsZeroExtended(OpA) && IsZeroExtended(OpB))) { ... unsigned ShiftOpc = IsSigned ? ISD::SRA : ISD::SRL; return DAG.getNode(ShiftOpc, dl, VT, Add, ConstantOne); } sdesmalen: You can combine these cases doing something like this: if ((IsSigned && IsSignExtended(OpA)…
		return DAG.getNode(ISD::ADD, dl, VT, Add, tmp);
		}

		dmgreenUnsubmitted Done Reply Inline Actions Doesn't need the else after the return, which might help simplify if this is checking the sign bits. dmgreen: Doesn't need the else after the return, which might help simplify if this is checking the sign…
SDValue		SDValue
AArch64TargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op,		AArch64TargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
assert(Subtarget->isTargetWindows() &&		assert(Subtarget->isTargetWindows() &&
"Only Windows alloca probing supported");		"Only Windows alloca probing supported");
SDLoc dl(Op);		SDLoc dl(Op);
// Get the inputs.		// Get the inputs.
		sdesmalenUnsubmitted Done Reply Inline Actions nit: SDValue Tmp = DAG.getNode(IsCeil ? ISD::OR : ISD::AND, dl, VT, OpA, OpB); sdesmalen: nit: SDValue Tmp = DAG.getNode(IsCeil ? ISD::OR : ISD::AND, dl, VT, OpA, OpB);
SDNode *Node = Op.getNode();		SDNode *Node = Op.getNode();
SDValue Chain = Op.getOperand(0);		SDValue Chain = Op.getOperand(0);
SDValue Size = Op.getOperand(1);		SDValue Size = Op.getOperand(1);
MaybeAlign Align =		MaybeAlign Align =
cast<ConstantSDNode>(Op.getOperand(2))->getMaybeAlignValue();		cast<ConstantSDNode>(Op.getOperand(2))->getMaybeAlignValue();
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);

if (DAG.getMachineFunction().getFunction().hasFnAttribute(		if (DAG.getMachineFunction().getFunction().hasFnAttribute(
▲ Show 20 Lines • Show All 4,250 Lines • ▼ Show 20 Lines	if (VT.isFixedLengthVector() && VT.is64BitVector() && N0.hasOneUse() &&
N0.getOpcode() == AArch64ISD::DUP) {		N0.getOpcode() == AArch64ISD::DUP) {
SDValue Op = N0.getOperand(0);		SDValue Op = N0.getOperand(0);
if (VT.getScalarType() == MVT::i32 &&		if (VT.getScalarType() == MVT::i32 &&
N0.getOperand(0).getValueType().getScalarType() == MVT::i64)		N0.getOperand(0).getValueType().getScalarType() == MVT::i64)
Op = DAG.getNode(ISD::TRUNCATE, SDLoc(N), MVT::i32, Op);		Op = DAG.getNode(ISD::TRUNCATE, SDLoc(N), MVT::i32, Op);
return DAG.getNode(N0.getOpcode(), SDLoc(N), VT, Op);		return DAG.getNode(N0.getOpcode(), SDLoc(N), VT, Op);
}		}

return SDValue();		return SDValue();
		sdesmalenUnsubmitted Not Done Reply Inline Actions As I understand it, the point to simplify the following: trunc(lshr(add(add(ext(a), ext(b)), 1), 1)) -> lshr(a, 1) + lshr(b, 1) + (a \| b) & 1 iff the type of ext(a) is not a legal type sdesmalen: As I understand it, the point to simplify the following: trunc(lshr(add(add(ext(a), ext…
}		}

		sdesmalenUnsubmitted Not Done Reply Inline Actions I see that in llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp it tries to combine this pattern into a simpler `AVGFLOORS/AVGFLOORU/AVGCEILS/AVGCEILU` node in the function `combineShiftToAVG`. Is there a way to re-use this existing mechanism and then Custom lower this node to the desired set of instructions? I even wonder if we could generalise this code (i.e. not specific to AArch64), when the nodes are set to Expand. sdesmalen: I see that in llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp it tries to combine this pattern…
		sdesmalenUnsubmitted Not Done Reply Inline Actions You're testing for SVE2 (vector extension), but the test is using scalar types, not vector types. That doesn't seem entirely right? sdesmalen: You're testing for SVE2 (vector extension), but the test is using scalar types, not vector…
// Check an node is an extend or shift operand		// Check an node is an extend or shift operand
static bool isExtendOrShiftOperand(SDValue N) {		static bool isExtendOrShiftOperand(SDValue N) {
unsigned Opcode = N.getOpcode();		unsigned Opcode = N.getOpcode();
if (Opcode == ISD::SIGN_EXTEND \|\| Opcode == ISD::SIGN_EXTEND_INREG \|\|		if (Opcode == ISD::SIGN_EXTEND \|\| Opcode == ISD::SIGN_EXTEND_INREG \|\|
Opcode == ISD::ZERO_EXTEND \|\| Opcode == ISD::ANY_EXTEND) {		Opcode == ISD::ZERO_EXTEND \|\| Opcode == ISD::ANY_EXTEND) {
EVT SrcVT;		EVT SrcVT;
if (Opcode == ISD::SIGN_EXTEND_INREG)		if (Opcode == ISD::SIGN_EXTEND_INREG)
SrcVT = cast<VTSDNode>(N.getOperand(1))->getVT();		SrcVT = cast<VTSDNode>(N.getOperand(1))->getVT();
▲ Show 20 Lines • Show All 965 Lines • ▼ Show 20 Lines	if (!DCI.isBeforeLegalizeOps() && N->getOpcode() == ISD::ZERO_EXTEND &&
return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), N->getValueType(0), NewABD);		return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), N->getValueType(0), NewABD);
}		}

if (N->getValueType(0).isFixedLengthVector() &&		if (N->getValueType(0).isFixedLengthVector() &&
N->getOpcode() == ISD::SIGN_EXTEND &&		N->getOpcode() == ISD::SIGN_EXTEND &&
N->getOperand(0)->getOpcode() == ISD::SETCC)		N->getOperand(0)->getOpcode() == ISD::SETCC)
return performSignExtendSetCCCombine(N, DCI, DAG);		return performSignExtendSetCCCombine(N, DCI, DAG);

return SDValue();		return SDValue();
		sdesmalenUnsubmitted Not Done Reply Inline Actions I don't believe this is a transform we want to make. sdesmalen: I don't believe this is a transform we want to make.
}		}

static SDValue splitStoreSplat(SelectionDAG &DAG, StoreSDNode &St,		static SDValue splitStoreSplat(SelectionDAG &DAG, StoreSDNode &St,
SDValue SplatVal, unsigned NumVecElts) {		SDValue SplatVal, unsigned NumVecElts) {
assert(!St.isTruncatingStore() && "cannot split truncating vector store");		assert(!St.isTruncatingStore() && "cannot split truncating vector store");
Align OrigAlignment = St.getAlign();		Align OrigAlignment = St.getAlign();
unsigned EltOffset = SplatVal.getValueType().getSizeInBits() / 8;		unsigned EltOffset = SplatVal.getValueType().getSizeInBits() / 8;

▲ Show 20 Lines • Show All 5,855 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-hadd.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+sve \| FileCheck %s -check-prefixes=CHECK,SVE
				; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+sve2 \| FileCheck %s -check-prefixes=CHECK,SVE2

				define <vscale x 2 x i64> @hadds_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
				; SVE-LABEL: hadds_v2i64:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.d, z1.d, #1
				; SVE-NEXT: asr z3.d, z0.d, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.d, z3.d, z2.d
				; SVE-NEXT: and z0.d, z0.d, #0x1
				; SVE-NEXT: add z0.d, z1.d, z0.d
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v2i64:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.d
				; SVE2-NEXT: shadd z0.d, p0/m, z0.d, z1.d
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
				%s1s = sext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
				%m = add nsw <vscale x 2 x i128> %s0s, %s1s
				%s = ashr <vscale x 2 x i128> %m, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
				dmgreenUnsubmitted Not Done Reply Inline Actions Can you copy these tests so there are versions with both lshr and ashr. dmgreen: Can you copy these tests so there are versions with both lshr and ashr.
				%s2 = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %s2
				}

				define <vscale x 2 x i64> @hadds_v2i64_lsh(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
				; SVE-LABEL: hadds_v2i64_lsh:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.d, z1.d, #1
				; SVE-NEXT: asr z3.d, z0.d, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.d, z3.d, z2.d
				; SVE-NEXT: and z0.d, z0.d, #0x1
				; SVE-NEXT: add z0.d, z1.d, z0.d
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v2i64_lsh:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.d
				; SVE2-NEXT: shadd z0.d, p0/m, z0.d, z1.d
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
				%s1s = sext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
				%m = add nsw <vscale x 2 x i128> %s0s, %s1s
				%s = lshr <vscale x 2 x i128> %m, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
				%s2 = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %s2
				}

				define <vscale x 2 x i64> @haddu_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
				; SVE-LABEL: haddu_v2i64:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: lsr z2.d, z1.d, #1
				; SVE-NEXT: lsr z3.d, z0.d, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.d, z3.d, z2.d
				; SVE-NEXT: and z0.d, z0.d, #0x1
				; SVE-NEXT: add z0.d, z1.d, z0.d
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: haddu_v2i64:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.d
				; SVE2-NEXT: uhadd z0.d, p0/m, z0.d, z1.d
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
				%s1s = zext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
				%m = add nuw nsw <vscale x 2 x i128> %s0s, %s1s
				%s = lshr <vscale x 2 x i128> %m, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
				dmgreenUnsubmitted Not Done Reply Inline Actions This ideally shouldn't change. Because the top bit isn't demanded instcombine will transform ashr into lshr, and we should be testing what the backend will see: https://godbolt.org/z/bvof8j7cc. I guess the lshr version isn't transformed after it gets promoted? It might be OK in this case. dmgreen: This ideally shouldn't change. Because the top bit isn't demanded instcombine will transform…
				hassnaa-armAuthorUnsubmitted Done Reply Inline Actions But for some cases, using lshr in the IR cause generating extra AND instruction. hassnaa-arm: But for some cases, using lshr in the IR cause generating extra AND instruction.
				%s2 = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %s2
				}

				define <vscale x 2 x i32> @hadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
				; SVE-LABEL: hadds_v2i32:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: ptrue p0.d
				; SVE-NEXT: sxtw z0.d, p0/m, z0.d
				; SVE-NEXT: adr z0.d, [z0.d, z1.d, sxtw]
				; SVE-NEXT: asr z0.d, z0.d, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v2i32:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.d
				; SVE2-NEXT: sxtw z0.d, p0/m, z0.d
				; SVE2-NEXT: sxtw z1.d, p0/m, z1.d
				; SVE2-NEXT: shadd z0.d, p0/m, z0.d, z1.d
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
				%s1s = sext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
				%m = add nsw <vscale x 2 x i64> %s0s, %s1s
				%s = ashr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				%s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
				ret <vscale x 2 x i32> %s2
				}

				hassnaa-armAuthorUnsubmitted Done Reply Inline Actions @dmgreen I tried to get alive proofs for this transform, https://alive2.llvm.org/ce/ but it seems there is something wrong. I'm not sure if the problem related to my equivalent IR to the generated code or the problem is because the transform is not correct. hassnaa-arm: @dmgreen I tried to get alive proofs for this transform, https://alive2.llvm.org/ce/ but it…
				dmgreenUnsubmitted Not Done Reply Inline Actions Can you update the link? There are some other alive links that could help. dmgreen: Can you update the link? There are some other alive links that could help.
				define <vscale x 2 x i32> @hadds_v2i32_lsh(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
				; CHECK-LABEL: hadds_v2i32_lsh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: sxtw z0.d, p0/m, z0.d
				; CHECK-NEXT: adr z0.d, [z0.d, z1.d, sxtw]
				; CHECK-NEXT: lsr z0.d, z0.d, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
				%s1s = sext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
				%m = add nsw <vscale x 2 x i64> %s0s, %s1s
				%s = lshr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				%s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
				ret <vscale x 2 x i32> %s2
				}

				define <vscale x 2 x i32> @haddu_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
				; SVE-LABEL: haddu_v2i32:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: and z0.d, z0.d, #0xffffffff
				; SVE-NEXT: adr z0.d, [z0.d, z1.d, uxtw]
				; SVE-NEXT: lsr z0.d, z0.d, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: haddu_v2i32:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.d
				; SVE2-NEXT: and z0.d, z0.d, #0xffffffff
				; SVE2-NEXT: and z1.d, z1.d, #0xffffffff
				; SVE2-NEXT: uhadd z0.d, p0/m, z0.d, z1.d
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
				%s1s = zext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
				%m = add nuw nsw <vscale x 2 x i64> %s0s, %s1s
				%s = lshr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				%s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
				ret <vscale x 2 x i32> %s2
				}

				define <vscale x 4 x i32> @hadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
				; SVE-LABEL: hadds_v4i32:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.s, z1.s, #1
				; SVE-NEXT: asr z3.s, z0.s, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.s, z3.s, z2.s
				; SVE-NEXT: and z0.s, z0.s, #0x1
				; SVE-NEXT: add z0.s, z1.s, z0.s
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v4i32:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.s
				; SVE2-NEXT: shadd z0.s, p0/m, z0.s, z1.s
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
				%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
				%m = add nsw <vscale x 4 x i64> %s0s, %s1s
				%s = ashr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
				dmgreenUnsubmitted Not Done Reply Inline Actions Is there a lshr version of this one? It would be good to have some that are "full width" and use lshr. As instcombine will convert all the ashr to lshr it might be best to make sure there are tests for all the functions that were changed. dmgreen: Is there a lshr version of this one? It would be good to have some that are "full width" and…
				hassnaa-armAuthorUnsubmitted Done Reply Inline Actions Sorry, I don't understand what you mean by "full width" ? hassnaa-arm: Sorry, I don't understand what you mean by "full width" ?
				dmgreenUnsubmitted Done Reply Inline Actions I just meant a multiple of 128 - a <vscale x 4 x i32>. There appear to be `<vscale x 2 x ..>` tests for lshr, but we should have all the others sizes too. dmgreen: I just meant a multiple of 128 - a <vscale x 4 x i32>. There appear to be `<vscale x 2 x ..>`…
				%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %s2
				}

				define <vscale x 4 x i32> @hadds_v4i32_lsh(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
				; SVE-LABEL: hadds_v4i32_lsh:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.s, z1.s, #1
				; SVE-NEXT: asr z3.s, z0.s, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.s, z3.s, z2.s
				; SVE-NEXT: and z0.s, z0.s, #0x1
				; SVE-NEXT: add z0.s, z1.s, z0.s
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v4i32_lsh:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.s
				; SVE2-NEXT: shadd z0.s, p0/m, z0.s, z1.s
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
				%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
				%m = add nsw <vscale x 4 x i64> %s0s, %s1s
				%s = lshr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
				%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %s2
				}

				define <vscale x 4 x i32> @haddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
				; SVE-LABEL: haddu_v4i32:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: lsr z2.s, z1.s, #1
				; SVE-NEXT: lsr z3.s, z0.s, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.s, z3.s, z2.s
				; SVE-NEXT: and z0.s, z0.s, #0x1
				; SVE-NEXT: add z0.s, z1.s, z0.s
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: haddu_v4i32:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.s
				; SVE2-NEXT: uhadd z0.s, p0/m, z0.s, z1.s
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
				%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
				%m = add nuw nsw <vscale x 4 x i64> %s0s, %s1s
				%s = lshr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
				%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %s2
				}

				define <vscale x 2 x i16> @hadds_v2i16(<vscale x 2 x i16> %s0, <vscale x 2 x i16> %s1) {
				; SVE-LABEL: hadds_v2i16:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: ptrue p0.d
				; SVE-NEXT: sxth z0.d, p0/m, z0.d
				; SVE-NEXT: sxth z1.d, p0/m, z1.d
				; SVE-NEXT: add z0.d, z0.d, z1.d
				; SVE-NEXT: asr z0.d, z0.d, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v2i16:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.d
				; SVE2-NEXT: sxth z0.d, p0/m, z0.d
				; SVE2-NEXT: sxth z1.d, p0/m, z1.d
				; SVE2-NEXT: shadd z0.d, p0/m, z0.d, z1.d
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i16> %s0 to <vscale x 2 x i32>
				%s1s = sext <vscale x 2 x i16> %s1 to <vscale x 2 x i32>
				%m = add nsw <vscale x 2 x i32> %s0s, %s1s
				%s = ashr <vscale x 2 x i32> %m, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				%s2 = trunc <vscale x 2 x i32> %s to <vscale x 2 x i16>
				ret <vscale x 2 x i16> %s2
				}

				define <vscale x 2 x i16> @hadds_v2i16_lsh(<vscale x 2 x i16> %s0, <vscale x 2 x i16> %s1) {
				; CHECK-LABEL: hadds_v2i16_lsh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: sxth z0.d, p0/m, z0.d
				; CHECK-NEXT: sxth z1.d, p0/m, z1.d
				; CHECK-NEXT: add z0.d, z0.d, z1.d
				; CHECK-NEXT: and z0.d, z0.d, #0xffffffff
				; CHECK-NEXT: lsr z0.d, z0.d, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i16> %s0 to <vscale x 2 x i32>
				%s1s = sext <vscale x 2 x i16> %s1 to <vscale x 2 x i32>
				%m = add nsw <vscale x 2 x i32> %s0s, %s1s
				%s = lshr <vscale x 2 x i32> %m, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				%s2 = trunc <vscale x 2 x i32> %s to <vscale x 2 x i16>
				ret <vscale x 2 x i16> %s2
				}

				define <vscale x 2 x i16> @haddu_v2i16(<vscale x 2 x i16> %s0, <vscale x 2 x i16> %s1) {
				; SVE-LABEL: haddu_v2i16:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: and z0.d, z0.d, #0xffff
				; SVE-NEXT: and z1.d, z1.d, #0xffff
				; SVE-NEXT: add z0.d, z0.d, z1.d
				; SVE-NEXT: lsr z0.d, z0.d, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: haddu_v2i16:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.d
				; SVE2-NEXT: and z0.d, z0.d, #0xffff
				; SVE2-NEXT: and z1.d, z1.d, #0xffff
				; SVE2-NEXT: uhadd z0.d, p0/m, z0.d, z1.d
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 2 x i16> %s0 to <vscale x 2 x i32>
				%s1s = zext <vscale x 2 x i16> %s1 to <vscale x 2 x i32>
				%m = add nuw nsw <vscale x 2 x i32> %s0s, %s1s
				%s = lshr <vscale x 2 x i32> %m, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				%s2 = trunc <vscale x 2 x i32> %s to <vscale x 2 x i16>
				ret <vscale x 2 x i16> %s2
				}

				define <vscale x 4 x i16> @hadds_v4i16(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {
				; SVE-LABEL: hadds_v4i16:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: ptrue p0.s
				; SVE-NEXT: sxth z0.s, p0/m, z0.s
				; SVE-NEXT: sxth z1.s, p0/m, z1.s
				; SVE-NEXT: add z0.s, z0.s, z1.s
				; SVE-NEXT: asr z0.s, z0.s, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v4i16:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.s
				; SVE2-NEXT: sxth z0.s, p0/m, z0.s
				; SVE2-NEXT: sxth z1.s, p0/m, z1.s
				; SVE2-NEXT: shadd z0.s, p0/m, z0.s, z1.s
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
				%s1s = sext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
				%m = add nsw <vscale x 4 x i32> %s0s, %s1s
				%s = ashr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				%s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
				ret <vscale x 4 x i16> %s2
				}

				define <vscale x 4 x i16> @hadds_v4i16_lsh(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {
				; CHECK-LABEL: hadds_v4i16_lsh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: sxth z0.s, p0/m, z0.s
				; CHECK-NEXT: sxth z1.s, p0/m, z1.s
				; CHECK-NEXT: add z0.s, z0.s, z1.s
				; CHECK-NEXT: lsr z0.s, z0.s, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
				%s1s = sext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
				%m = add nsw <vscale x 4 x i32> %s0s, %s1s
				%s = lshr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				%s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
				ret <vscale x 4 x i16> %s2
				}

				define <vscale x 4 x i16> @haddu_v4i16(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {
				; SVE-LABEL: haddu_v4i16:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: and z0.s, z0.s, #0xffff
				; SVE-NEXT: and z1.s, z1.s, #0xffff
				; SVE-NEXT: add z0.s, z0.s, z1.s
				; SVE-NEXT: lsr z0.s, z0.s, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: haddu_v4i16:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.s
				; SVE2-NEXT: and z0.s, z0.s, #0xffff
				; SVE2-NEXT: and z1.s, z1.s, #0xffff
				; SVE2-NEXT: uhadd z0.s, p0/m, z0.s, z1.s
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
				%s1s = zext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
				%m = add nuw nsw <vscale x 4 x i32> %s0s, %s1s
				%s = lshr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				%s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
				ret <vscale x 4 x i16> %s2
				}

				define <vscale x 8 x i16> @hadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
				; SVE-LABEL: hadds_v8i16:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.h, z1.h, #1
				; SVE-NEXT: asr z3.h, z0.h, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.h, z3.h, z2.h
				; SVE-NEXT: and z0.h, z0.h, #0x1
				; SVE-NEXT: add z0.h, z1.h, z0.h
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v8i16:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.h
				; SVE2-NEXT: shadd z0.h, p0/m, z0.h, z1.h
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
				%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
				%m = add nsw <vscale x 8 x i32> %s0s, %s1s
				%s = ashr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %s2
				}

				define <vscale x 8 x i16> @hadds_v8i16_lsh(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
				; SVE-LABEL: hadds_v8i16_lsh:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.h, z1.h, #1
				; SVE-NEXT: asr z3.h, z0.h, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.h, z3.h, z2.h
				; SVE-NEXT: and z0.h, z0.h, #0x1
				; SVE-NEXT: add z0.h, z1.h, z0.h
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v8i16_lsh:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.h
				; SVE2-NEXT: shadd z0.h, p0/m, z0.h, z1.h
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
				%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
				%m = add nsw <vscale x 8 x i32> %s0s, %s1s
				%s = lshr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %s2
				}

				define <vscale x 8 x i16> @haddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
				; SVE-LABEL: haddu_v8i16:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: lsr z2.h, z1.h, #1
				; SVE-NEXT: lsr z3.h, z0.h, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.h, z3.h, z2.h
				; SVE-NEXT: and z0.h, z0.h, #0x1
				; SVE-NEXT: add z0.h, z1.h, z0.h
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: haddu_v8i16:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.h
				; SVE2-NEXT: uhadd z0.h, p0/m, z0.h, z1.h
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
				%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
				%m = add nuw nsw <vscale x 8 x i32> %s0s, %s1s
				%s = lshr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %s2
				}

				define <vscale x 4 x i8> @hadds_v4i8(<vscale x 4 x i8> %s0, <vscale x 4 x i8> %s1) {
				; SVE-LABEL: hadds_v4i8:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: ptrue p0.s
				; SVE-NEXT: sxtb z0.s, p0/m, z0.s
				; SVE-NEXT: sxtb z1.s, p0/m, z1.s
				; SVE-NEXT: add z0.s, z0.s, z1.s
				; SVE-NEXT: asr z0.s, z0.s, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v4i8:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.s
				; SVE2-NEXT: sxtb z0.s, p0/m, z0.s
				; SVE2-NEXT: sxtb z1.s, p0/m, z1.s
				; SVE2-NEXT: shadd z0.s, p0/m, z0.s, z1.s
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i8> %s0 to <vscale x 4 x i16>
				%s1s = sext <vscale x 4 x i8> %s1 to <vscale x 4 x i16>
				%m = add nsw <vscale x 4 x i16> %s0s, %s1s
				%s = ashr <vscale x 4 x i16> %m, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
				%s2 = trunc <vscale x 4 x i16> %s to <vscale x 4 x i8>
				ret <vscale x 4 x i8> %s2
				}

				define <vscale x 4 x i8> @hadds_v4i8_lsh(<vscale x 4 x i8> %s0, <vscale x 4 x i8> %s1) {
				; CHECK-LABEL: hadds_v4i8_lsh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: sxtb z0.s, p0/m, z0.s
				; CHECK-NEXT: sxtb z1.s, p0/m, z1.s
				; CHECK-NEXT: add z0.s, z0.s, z1.s
				; CHECK-NEXT: and z0.s, z0.s, #0xffff
				; CHECK-NEXT: lsr z0.s, z0.s, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i8> %s0 to <vscale x 4 x i16>
				%s1s = sext <vscale x 4 x i8> %s1 to <vscale x 4 x i16>
				%m = add nsw <vscale x 4 x i16> %s0s, %s1s
				%s = lshr <vscale x 4 x i16> %m, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
				%s2 = trunc <vscale x 4 x i16> %s to <vscale x 4 x i8>
				ret <vscale x 4 x i8> %s2
				}

				define <vscale x 4 x i8> @haddu_v4i8(<vscale x 4 x i8> %s0, <vscale x 4 x i8> %s1) {
				; SVE-LABEL: haddu_v4i8:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: and z0.s, z0.s, #0xff
				; SVE-NEXT: and z1.s, z1.s, #0xff
				; SVE-NEXT: add z0.s, z0.s, z1.s
				; SVE-NEXT: lsr z0.s, z0.s, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: haddu_v4i8:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.s
				; SVE2-NEXT: and z0.s, z0.s, #0xff
				; SVE2-NEXT: and z1.s, z1.s, #0xff
				; SVE2-NEXT: uhadd z0.s, p0/m, z0.s, z1.s
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 4 x i8> %s0 to <vscale x 4 x i16>
				%s1s = zext <vscale x 4 x i8> %s1 to <vscale x 4 x i16>
				%m = add nuw nsw <vscale x 4 x i16> %s0s, %s1s
				%s = lshr <vscale x 4 x i16> %m, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
				%s2 = trunc <vscale x 4 x i16> %s to <vscale x 4 x i8>
				ret <vscale x 4 x i8> %s2
				}

				define <vscale x 8 x i8> @hadds_v8i8(<vscale x 8 x i8> %s0, <vscale x 8 x i8> %s1) {
				; SVE-LABEL: hadds_v8i8:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: ptrue p0.h
				; SVE-NEXT: sxtb z0.h, p0/m, z0.h
				; SVE-NEXT: sxtb z1.h, p0/m, z1.h
				; SVE-NEXT: add z0.h, z0.h, z1.h
				; SVE-NEXT: asr z0.h, z0.h, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v8i8:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.h
				; SVE2-NEXT: sxtb z0.h, p0/m, z0.h
				; SVE2-NEXT: sxtb z1.h, p0/m, z1.h
				; SVE2-NEXT: shadd z0.h, p0/m, z0.h, z1.h
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 8 x i8> %s0 to <vscale x 8 x i16>
				%s1s = sext <vscale x 8 x i8> %s1 to <vscale x 8 x i16>
				%m = add nsw <vscale x 8 x i16> %s0s, %s1s
				%s = ashr <vscale x 8 x i16> %m, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				%s2 = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
				ret <vscale x 8 x i8> %s2
				}

				define <vscale x 8 x i8> @hadds_v8i8_lsh(<vscale x 8 x i8> %s0, <vscale x 8 x i8> %s1) {
				; CHECK-LABEL: hadds_v8i8_lsh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: sxtb z0.h, p0/m, z0.h
				; CHECK-NEXT: sxtb z1.h, p0/m, z1.h
				; CHECK-NEXT: add z0.h, z0.h, z1.h
				; CHECK-NEXT: lsr z0.h, z0.h, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 8 x i8> %s0 to <vscale x 8 x i16>
				%s1s = sext <vscale x 8 x i8> %s1 to <vscale x 8 x i16>
				%m = add nsw <vscale x 8 x i16> %s0s, %s1s
				%s = lshr <vscale x 8 x i16> %m, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				%s2 = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
				ret <vscale x 8 x i8> %s2
				}

				define <vscale x 8 x i8> @haddu_v8i8(<vscale x 8 x i8> %s0, <vscale x 8 x i8> %s1) {
				; SVE-LABEL: haddu_v8i8:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: and z0.h, z0.h, #0xff
				; SVE-NEXT: and z1.h, z1.h, #0xff
				; SVE-NEXT: add z0.h, z0.h, z1.h
				; SVE-NEXT: lsr z0.h, z0.h, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: haddu_v8i8:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.h
				; SVE2-NEXT: and z0.h, z0.h, #0xff
				; SVE2-NEXT: and z1.h, z1.h, #0xff
				; SVE2-NEXT: uhadd z0.h, p0/m, z0.h, z1.h
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 8 x i8> %s0 to <vscale x 8 x i16>
				%s1s = zext <vscale x 8 x i8> %s1 to <vscale x 8 x i16>
				%m = add nuw nsw <vscale x 8 x i16> %s0s, %s1s
				%s = lshr <vscale x 8 x i16> %m, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				%s2 = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
				ret <vscale x 8 x i8> %s2
				}

				define <vscale x 16 x i8> @hadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
				; SVE-LABEL: hadds_v16i8:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.b, z1.b, #1
				; SVE-NEXT: asr z3.b, z0.b, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.b, z3.b, z2.b
				; SVE-NEXT: and z0.b, z0.b, #0x1
				; SVE-NEXT: add z0.b, z1.b, z0.b
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v16i8:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.b
				; SVE2-NEXT: shadd z0.b, p0/m, z0.b, z1.b
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
				%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
				%m = add nsw <vscale x 16 x i16> %s0s, %s1s
				%s = ashr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %s2
				}

				define <vscale x 16 x i8> @hadds_v16i8_lsh(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
				; SVE-LABEL: hadds_v16i8_lsh:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.b, z1.b, #1
				; SVE-NEXT: asr z3.b, z0.b, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.b, z3.b, z2.b
				; SVE-NEXT: and z0.b, z0.b, #0x1
				; SVE-NEXT: add z0.b, z1.b, z0.b
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: hadds_v16i8_lsh:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.b
				; SVE2-NEXT: shadd z0.b, p0/m, z0.b, z1.b
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
				%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
				%m = add nsw <vscale x 16 x i16> %s0s, %s1s
				%s = lshr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %s2
				}

				define <vscale x 16 x i8> @haddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
				; SVE-LABEL: haddu_v16i8:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: lsr z2.b, z1.b, #1
				; SVE-NEXT: lsr z3.b, z0.b, #1
				; SVE-NEXT: and z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.b, z3.b, z2.b
				; SVE-NEXT: and z0.b, z0.b, #0x1
				; SVE-NEXT: add z0.b, z1.b, z0.b
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: haddu_v16i8:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.b
				; SVE2-NEXT: uhadd z0.b, p0/m, z0.b, z1.b
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
				%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
				%m = add nuw nsw <vscale x 16 x i16> %s0s, %s1s
				%s = lshr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %s2
				}

				define <vscale x 2 x i64> @rhadds_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
				; SVE-LABEL: rhadds_v2i64:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.d, z1.d, #1
				; SVE-NEXT: asr z3.d, z0.d, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.d, z3.d, z2.d
				; SVE-NEXT: and z0.d, z0.d, #0x1
				; SVE-NEXT: add z0.d, z1.d, z0.d
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhadds_v2i64:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.d
				; SVE2-NEXT: srhadd z0.d, p0/m, z0.d, z1.d
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
				%s1s = sext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
				%add = add <vscale x 2 x i128> %s0s, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
				%add2 = add <vscale x 2 x i128> %add, %s1s
				%s = ashr <vscale x 2 x i128> %add2, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
				%result = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %result
				}

				define <vscale x 2 x i64> @rhadds_v2i64_lsh(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
				; SVE-LABEL: rhadds_v2i64_lsh:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.d, z1.d, #1
				; SVE-NEXT: asr z3.d, z0.d, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.d, z3.d, z2.d
				; SVE-NEXT: and z0.d, z0.d, #0x1
				; SVE-NEXT: add z0.d, z1.d, z0.d
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhadds_v2i64_lsh:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.d
				; SVE2-NEXT: srhadd z0.d, p0/m, z0.d, z1.d
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
				%s1s = sext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
				%add = add <vscale x 2 x i128> %s0s, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
				%add2 = add <vscale x 2 x i128> %add, %s1s
				%s = lshr <vscale x 2 x i128> %add2, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
				%result = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %result
				}

				define <vscale x 2 x i64> @rhaddu_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
				; SVE-LABEL: rhaddu_v2i64:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: lsr z2.d, z1.d, #1
				; SVE-NEXT: lsr z3.d, z0.d, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.d, z3.d, z2.d
				; SVE-NEXT: and z0.d, z0.d, #0x1
				; SVE-NEXT: add z0.d, z1.d, z0.d
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhaddu_v2i64:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.d
				; SVE2-NEXT: urhadd z0.d, p0/m, z0.d, z1.d
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
				%s1s = zext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
				%add = add nuw nsw <vscale x 2 x i128> %s0s, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 2 x i128> %add, %s1s
				%s = lshr <vscale x 2 x i128> %add2, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
				%result = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %result
				}

				define <vscale x 2 x i32> @rhadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
				; CHECK-LABEL: rhadds_v2i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: sxtw z0.d, p0/m, z0.d
				; CHECK-NEXT: sxtw z1.d, p0/m, z1.d
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.d, z1.d, z0.d
				; CHECK-NEXT: asr z0.d, z0.d, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
				%s1s = sext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
				%add = add <vscale x 2 x i64> %s0s, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				%add2 = add <vscale x 2 x i64> %add, %s1s
				%s = ashr <vscale x 2 x i64> %add2, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				%result = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
				ret <vscale x 2 x i32> %result
				}

				define <vscale x 2 x i32> @rhadds_v2i32_lsh(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
				; CHECK-LABEL: rhadds_v2i32_lsh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: sxtw z0.d, p0/m, z0.d
				; CHECK-NEXT: sxtw z1.d, p0/m, z1.d
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.d, z1.d, z0.d
				; CHECK-NEXT: lsr z0.d, z0.d, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
				%s1s = sext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
				%add = add <vscale x 2 x i64> %s0s, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				%add2 = add <vscale x 2 x i64> %add, %s1s
				%s = lshr <vscale x 2 x i64> %add2, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				%result = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
				ret <vscale x 2 x i32> %result
				}

				define <vscale x 2 x i32> @rhaddu_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
				; SVE-LABEL: rhaddu_v2i32:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
				; SVE-NEXT: and z0.d, z0.d, #0xffffffff
				; SVE-NEXT: and z1.d, z1.d, #0xffffffff
				; SVE-NEXT: eor z0.d, z0.d, z2.d
				; SVE-NEXT: sub z0.d, z1.d, z0.d
				; SVE-NEXT: lsr z0.d, z0.d, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhaddu_v2i32:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.d
				; SVE2-NEXT: and z0.d, z0.d, #0xffffffff
				; SVE2-NEXT: and z1.d, z1.d, #0xffffffff
				; SVE2-NEXT: urhadd z0.d, p0/m, z0.d, z1.d
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
				%s1s = zext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
				%add = add nuw nsw <vscale x 2 x i64> %s0s, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 2 x i64> %add, %s1s
				%s = lshr <vscale x 2 x i64> %add2, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				%result = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
				ret <vscale x 2 x i32> %result
				}

				define <vscale x 4 x i32> @rhadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
				; SVE-LABEL: rhadds_v4i32:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.s, z1.s, #1
				; SVE-NEXT: asr z3.s, z0.s, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.s, z3.s, z2.s
				; SVE-NEXT: and z0.s, z0.s, #0x1
				; SVE-NEXT: add z0.s, z1.s, z0.s
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhadds_v4i32:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.s
				; SVE2-NEXT: srhadd z0.s, p0/m, z0.s, z1.s
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
				%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
				%add = add <vscale x 4 x i64> %s0s, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
				%add2 = add <vscale x 4 x i64> %add, %s1s
				%s = ashr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
				%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %result
				}

				define <vscale x 4 x i32> @rhadds_v4i32_lsh(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
				; SVE-LABEL: rhadds_v4i32_lsh:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.s, z1.s, #1
				; SVE-NEXT: asr z3.s, z0.s, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.s, z3.s, z2.s
				; SVE-NEXT: and z0.s, z0.s, #0x1
				; SVE-NEXT: add z0.s, z1.s, z0.s
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhadds_v4i32_lsh:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.s
				; SVE2-NEXT: srhadd z0.s, p0/m, z0.s, z1.s
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
				%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
				%add = add <vscale x 4 x i64> %s0s, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
				%add2 = add <vscale x 4 x i64> %add, %s1s
				%s = lshr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
				%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %result
				}

				define <vscale x 4 x i32> @rhaddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
				; SVE-LABEL: rhaddu_v4i32:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: lsr z2.s, z1.s, #1
				; SVE-NEXT: lsr z3.s, z0.s, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.s, z3.s, z2.s
				; SVE-NEXT: and z0.s, z0.s, #0x1
				; SVE-NEXT: add z0.s, z1.s, z0.s
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhaddu_v4i32:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.s
				; SVE2-NEXT: urhadd z0.s, p0/m, z0.s, z1.s
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
				%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
				%add = add nuw nsw <vscale x 4 x i64> %s0s, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 4 x i64> %add, %s1s
				%s = lshr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
				%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %result
				}

				define <vscale x 2 x i16> @rhadds_v2i16(<vscale x 2 x i16> %s0, <vscale x 2 x i16> %s1) {
				; CHECK-LABEL: rhadds_v2i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: sxth z0.d, p0/m, z0.d
				; CHECK-NEXT: sxth z1.d, p0/m, z1.d
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.d, z1.d, z0.d
				; CHECK-NEXT: asr z0.d, z0.d, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i16> %s0 to <vscale x 2 x i32>
				%s1s = sext <vscale x 2 x i16> %s1 to <vscale x 2 x i32>
				%add = add <vscale x 2 x i32> %s0s, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				%add2 = add <vscale x 2 x i32> %add, %s1s
				%s = ashr <vscale x 2 x i32> %add2, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				%result = trunc <vscale x 2 x i32> %s to <vscale x 2 x i16>
				ret <vscale x 2 x i16> %result
				}

				define <vscale x 2 x i16> @rhadds_v2i16_lsh(<vscale x 2 x i16> %s0, <vscale x 2 x i16> %s1) {
				; CHECK-LABEL: rhadds_v2i16_lsh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: sxth z0.d, p0/m, z0.d
				; CHECK-NEXT: sxth z1.d, p0/m, z1.d
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.d, z1.d, z0.d
				hassnaa-armAuthorUnsubmitted Done Reply Inline Actions here is a transformation proof for this case: https://alive2.llvm.org/ce/z/vPJi6R @dmgreen hassnaa-arm: here is a transformation proof for this case: https://alive2.llvm.org/ce/z/vPJi6R @dmgreen
				dmgreenUnsubmitted Not Done Reply Inline Actions I think that's another case. This one would be https://alive2.llvm.org/ce/z/tp5NmX. From what I can tell they all look OK, according to alive. dmgreen: I think that's another case. This one would be https://alive2.llvm.org/ce/z/tp5NmX. From what I…
				; CHECK-NEXT: and z0.d, z0.d, #0xffffffff
				; CHECK-NEXT: lsr z0.d, z0.d, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 2 x i16> %s0 to <vscale x 2 x i32>
				%s1s = sext <vscale x 2 x i16> %s1 to <vscale x 2 x i32>
				%add = add <vscale x 2 x i32> %s0s, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				%add2 = add <vscale x 2 x i32> %add, %s1s
				%s = lshr <vscale x 2 x i32> %add2, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				%result = trunc <vscale x 2 x i32> %s to <vscale x 2 x i16>
				ret <vscale x 2 x i16> %result
				}

				define <vscale x 2 x i16> @rhaddu_v2i16(<vscale x 2 x i16> %s0, <vscale x 2 x i16> %s1) {
				; CHECK-LABEL: rhaddu_v2i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: and z0.d, z0.d, #0xffff
				; CHECK-NEXT: and z1.d, z1.d, #0xffff
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.d, z1.d, z0.d
				; CHECK-NEXT: lsr z0.d, z0.d, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = zext <vscale x 2 x i16> %s0 to <vscale x 2 x i32>
				%s1s = zext <vscale x 2 x i16> %s1 to <vscale x 2 x i32>
				%add = add nuw nsw <vscale x 2 x i32> %s0s, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 2 x i32> %add, %s1s
				%s = lshr <vscale x 2 x i32> %add2, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				%result = trunc <vscale x 2 x i32> %s to <vscale x 2 x i16>
				ret <vscale x 2 x i16> %result
				}

				define <vscale x 4 x i16> @rhadds_v4i16(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {
				; CHECK-LABEL: rhadds_v4i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov z2.s, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: sxth z0.s, p0/m, z0.s
				; CHECK-NEXT: sxth z1.s, p0/m, z1.s
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.s, z1.s, z0.s
				; CHECK-NEXT: asr z0.s, z0.s, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
				%s1s = sext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
				%add = add <vscale x 4 x i32> %s0s, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				%add2 = add <vscale x 4 x i32> %add, %s1s
				%s = ashr <vscale x 4 x i32> %add2, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				%result = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
				ret <vscale x 4 x i16> %result
				}

				define <vscale x 4 x i16> @rhadds_v4i16_lsh(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {
				; CHECK-LABEL: rhadds_v4i16_lsh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov z2.s, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: sxth z0.s, p0/m, z0.s
				; CHECK-NEXT: sxth z1.s, p0/m, z1.s
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.s, z1.s, z0.s
				; CHECK-NEXT: lsr z0.s, z0.s, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
				%s1s = sext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
				%add = add <vscale x 4 x i32> %s0s, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				%add2 = add <vscale x 4 x i32> %add, %s1s
				%s = lshr <vscale x 4 x i32> %add2, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				%result = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
				ret <vscale x 4 x i16> %result
				}

				define <vscale x 4 x i16> @rhaddu_v4i16(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {
				; SVE-LABEL: rhaddu_v4i16:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: mov z2.s, #-1 // =0xffffffffffffffff
				; SVE-NEXT: and z0.s, z0.s, #0xffff
				; SVE-NEXT: and z1.s, z1.s, #0xffff
				; SVE-NEXT: eor z0.d, z0.d, z2.d
				; SVE-NEXT: sub z0.s, z1.s, z0.s
				; SVE-NEXT: lsr z0.s, z0.s, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhaddu_v4i16:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.s
				; SVE2-NEXT: and z0.s, z0.s, #0xffff
				; SVE2-NEXT: and z1.s, z1.s, #0xffff
				; SVE2-NEXT: urhadd z0.s, p0/m, z0.s, z1.s
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
				%s1s = zext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
				%add = add nuw nsw <vscale x 4 x i32> %s0s, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 4 x i32> %add, %s1s
				%s = lshr <vscale x 4 x i32> %add2, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				%result = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
				ret <vscale x 4 x i16> %result
				}

				define <vscale x 8 x i16> @rhadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
				; SVE-LABEL: rhadds_v8i16:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.h, z1.h, #1
				; SVE-NEXT: asr z3.h, z0.h, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.h, z3.h, z2.h
				; SVE-NEXT: and z0.h, z0.h, #0x1
				; SVE-NEXT: add z0.h, z1.h, z0.h
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhadds_v8i16:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.h
				; SVE2-NEXT: srhadd z0.h, p0/m, z0.h, z1.h
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
				%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
				%add = add <vscale x 8 x i32> %s0s, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%add2 = add <vscale x 8 x i32> %add, %s1s
				%s = ashr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %result
				}

				define <vscale x 8 x i16> @rhadds_v8i16_lsh(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
				; SVE-LABEL: rhadds_v8i16_lsh:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.h, z1.h, #1
				; SVE-NEXT: asr z3.h, z0.h, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.h, z3.h, z2.h
				; SVE-NEXT: and z0.h, z0.h, #0x1
				; SVE-NEXT: add z0.h, z1.h, z0.h
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhadds_v8i16_lsh:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.h
				; SVE2-NEXT: srhadd z0.h, p0/m, z0.h, z1.h
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
				%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
				%add = add <vscale x 8 x i32> %s0s, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%add2 = add <vscale x 8 x i32> %add, %s1s
				%s = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %result
				}

				define <vscale x 8 x i16> @rhaddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
				; SVE-LABEL: rhaddu_v8i16:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: lsr z2.h, z1.h, #1
				; SVE-NEXT: lsr z3.h, z0.h, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.h, z3.h, z2.h
				; SVE-NEXT: and z0.h, z0.h, #0x1
				; SVE-NEXT: add z0.h, z1.h, z0.h
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhaddu_v8i16:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.h
				; SVE2-NEXT: urhadd z0.h, p0/m, z0.h, z1.h
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
				%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
				%add = add nuw nsw <vscale x 8 x i32> %s0s, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 8 x i32> %add, %s1s
				%s = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %result
				}

				define <vscale x 4 x i8> @rhadds_v4i8(<vscale x 4 x i8> %s0, <vscale x 4 x i8> %s1) {
				; CHECK-LABEL: rhadds_v4i8:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov z2.s, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: sxtb z0.s, p0/m, z0.s
				; CHECK-NEXT: sxtb z1.s, p0/m, z1.s
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.s, z1.s, z0.s
				; CHECK-NEXT: asr z0.s, z0.s, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i8> %s0 to <vscale x 4 x i16>
				%s1s = sext <vscale x 4 x i8> %s1 to <vscale x 4 x i16>
				%add = add <vscale x 4 x i16> %s0s, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
				%add2 = add <vscale x 4 x i16> %add, %s1s
				%s = ashr <vscale x 4 x i16> %add2, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
				%result = trunc <vscale x 4 x i16> %s to <vscale x 4 x i8>
				ret <vscale x 4 x i8> %result
				}

				define <vscale x 4 x i8> @rhadds_v4i8_lsh(<vscale x 4 x i8> %s0, <vscale x 4 x i8> %s1) {
				; CHECK-LABEL: rhadds_v4i8_lsh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov z2.s, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: sxtb z0.s, p0/m, z0.s
				; CHECK-NEXT: sxtb z1.s, p0/m, z1.s
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.s, z1.s, z0.s
				; CHECK-NEXT: and z0.s, z0.s, #0xffff
				; CHECK-NEXT: lsr z0.s, z0.s, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 4 x i8> %s0 to <vscale x 4 x i16>
				%s1s = sext <vscale x 4 x i8> %s1 to <vscale x 4 x i16>
				%add = add <vscale x 4 x i16> %s0s, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
				%add2 = add <vscale x 4 x i16> %add, %s1s
				%s = lshr <vscale x 4 x i16> %add2, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
				%result = trunc <vscale x 4 x i16> %s to <vscale x 4 x i8>
				ret <vscale x 4 x i8> %result
				}

				define <vscale x 4 x i8> @rhaddu_v4i8(<vscale x 4 x i8> %s0, <vscale x 4 x i8> %s1) {
				; CHECK-LABEL: rhaddu_v4i8:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z2.s, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: and z0.s, z0.s, #0xff
				; CHECK-NEXT: and z1.s, z1.s, #0xff
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.s, z1.s, z0.s
				; CHECK-NEXT: lsr z0.s, z0.s, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = zext <vscale x 4 x i8> %s0 to <vscale x 4 x i16>
				%s1s = zext <vscale x 4 x i8> %s1 to <vscale x 4 x i16>
				%add = add nuw nsw <vscale x 4 x i16> %s0s, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 4 x i16> %add, %s1s
				%s = lshr <vscale x 4 x i16> %add2, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
				%result = trunc <vscale x 4 x i16> %s to <vscale x 4 x i8>
				ret <vscale x 4 x i8> %result
				}

				define <vscale x 8 x i8> @rhadds_v8i8(<vscale x 8 x i8> %s0, <vscale x 8 x i8> %s1) {
				; CHECK-LABEL: rhadds_v8i8:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov z2.h, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: sxtb z0.h, p0/m, z0.h
				; CHECK-NEXT: sxtb z1.h, p0/m, z1.h
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.h, z1.h, z0.h
				; CHECK-NEXT: asr z0.h, z0.h, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 8 x i8> %s0 to <vscale x 8 x i16>
				%s1s = sext <vscale x 8 x i8> %s1 to <vscale x 8 x i16>
				%add = add <vscale x 8 x i16> %s0s, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				%add2 = add <vscale x 8 x i16> %add, %s1s
				%s = ashr <vscale x 8 x i16> %add2, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				%result = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
				ret <vscale x 8 x i8> %result
				}

				define <vscale x 8 x i8> @rhadds_v8i8_lsh(<vscale x 8 x i8> %s0, <vscale x 8 x i8> %s1) {
				; CHECK-LABEL: rhadds_v8i8_lsh:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov z2.h, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: sxtb z0.h, p0/m, z0.h
				; CHECK-NEXT: sxtb z1.h, p0/m, z1.h
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z0.h, z1.h, z0.h
				; CHECK-NEXT: lsr z0.h, z0.h, #1
				; CHECK-NEXT: ret
				entry:
				%s0s = sext <vscale x 8 x i8> %s0 to <vscale x 8 x i16>
				%s1s = sext <vscale x 8 x i8> %s1 to <vscale x 8 x i16>
				%add = add <vscale x 8 x i16> %s0s, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				%add2 = add <vscale x 8 x i16> %add, %s1s
				%s = lshr <vscale x 8 x i16> %add2, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				%result = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
				ret <vscale x 8 x i8> %result
				}

				define <vscale x 8 x i8> @rhaddu_v8i8(<vscale x 8 x i8> %s0, <vscale x 8 x i8> %s1) {
				; SVE-LABEL: rhaddu_v8i8:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: mov z2.h, #-1 // =0xffffffffffffffff
				; SVE-NEXT: and z0.h, z0.h, #0xff
				; SVE-NEXT: and z1.h, z1.h, #0xff
				; SVE-NEXT: eor z0.d, z0.d, z2.d
				; SVE-NEXT: sub z0.h, z1.h, z0.h
				; SVE-NEXT: lsr z0.h, z0.h, #1
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhaddu_v8i8:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.h
				; SVE2-NEXT: and z0.h, z0.h, #0xff
				; SVE2-NEXT: and z1.h, z1.h, #0xff
				; SVE2-NEXT: urhadd z0.h, p0/m, z0.h, z1.h
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 8 x i8> %s0 to <vscale x 8 x i16>
				%s1s = zext <vscale x 8 x i8> %s1 to <vscale x 8 x i16>
				%add = add nuw nsw <vscale x 8 x i16> %s0s, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 8 x i16> %add, %s1s
				%s = lshr <vscale x 8 x i16> %add2, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				%result = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
				ret <vscale x 8 x i8> %result
				}

				define <vscale x 16 x i8> @rhadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
				; SVE-LABEL: rhadds_v16i8:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.b, z1.b, #1
				; SVE-NEXT: asr z3.b, z0.b, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.b, z3.b, z2.b
				; SVE-NEXT: and z0.b, z0.b, #0x1
				; SVE-NEXT: add z0.b, z1.b, z0.b
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhadds_v16i8:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.b
				; SVE2-NEXT: srhadd z0.b, p0/m, z0.b, z1.b
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
				%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
				%add = add <vscale x 16 x i16> %s0s, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%add2 = add <vscale x 16 x i16> %add, %s1s
				%s = ashr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %result
				}

				define <vscale x 16 x i8> @rhadds_v16i8_lsh(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
				; SVE-LABEL: rhadds_v16i8_lsh:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: asr z2.b, z1.b, #1
				; SVE-NEXT: asr z3.b, z0.b, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.b, z3.b, z2.b
				; SVE-NEXT: and z0.b, z0.b, #0x1
				; SVE-NEXT: add z0.b, z1.b, z0.b
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhadds_v16i8_lsh:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.b
				; SVE2-NEXT: srhadd z0.b, p0/m, z0.b, z1.b
				; SVE2-NEXT: ret
				entry:
				%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
				%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
				%add = add <vscale x 16 x i16> %s0s, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%add2 = add <vscale x 16 x i16> %add, %s1s
				%s = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %result
				}

				define <vscale x 16 x i8> @rhaddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
				; SVE-LABEL: rhaddu_v16i8:
				; SVE: // %bb.0: // %entry
				; SVE-NEXT: lsr z2.b, z1.b, #1
				; SVE-NEXT: lsr z3.b, z0.b, #1
				; SVE-NEXT: orr z0.d, z0.d, z1.d
				; SVE-NEXT: add z1.b, z3.b, z2.b
				; SVE-NEXT: and z0.b, z0.b, #0x1
				; SVE-NEXT: add z0.b, z1.b, z0.b
				; SVE-NEXT: ret
				;
				; SVE2-LABEL: rhaddu_v16i8:
				; SVE2: // %bb.0: // %entry
				; SVE2-NEXT: ptrue p0.b
				; SVE2-NEXT: urhadd z0.b, p0/m, z0.b, z1.b
				; SVE2-NEXT: ret
				entry:
				%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
				%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
				%add = add nuw nsw <vscale x 16 x i16> %s0s, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 16 x i16> %add, %s1s
				%s = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %result
				}

llvm/test/CodeGen/AArch64/sve2-hadd.ll

This file was deleted.

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple aarch64-none-eabi -mattr=+sve2 -o - \| FileCheck %s

	define <vscale x 2 x i64> @hadds_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
	; CHECK-LABEL: hadds_v2i64:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: shadd z0.d, p0/m, z0.d, z1.d
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
	%s1s = sext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
	%m = add nsw <vscale x 2 x i128> %s0s, %s1s
	%s = lshr <vscale x 2 x i128> %m, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
	%s2 = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
	ret <vscale x 2 x i64> %s2
	}

	define <vscale x 2 x i64> @haddu_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
	; CHECK-LABEL: haddu_v2i64:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: uhadd z0.d, p0/m, z0.d, z1.d
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
	%s1s = zext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
	%m = add nuw nsw <vscale x 2 x i128> %s0s, %s1s
	%s = lshr <vscale x 2 x i128> %m, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
	%s2 = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
	ret <vscale x 2 x i64> %s2
	}

	define <vscale x 2 x i32> @hadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
	; CHECK-LABEL: hadds_v2i32:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: sxtw z0.d, p0/m, z0.d
	; CHECK-NEXT: adr z0.d, [z0.d, z1.d, sxtw]
	; CHECK-NEXT: lsr z0.d, z0.d, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
	%s1s = sext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
	%m = add nsw <vscale x 2 x i64> %s0s, %s1s
	%s = lshr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
	%s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
	ret <vscale x 2 x i32> %s2
	}

	define <vscale x 2 x i32> @haddu_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
	; CHECK-LABEL: haddu_v2i32:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: and z0.d, z0.d, #0xffffffff
	; CHECK-NEXT: and z1.d, z1.d, #0xffffffff
	; CHECK-NEXT: uhadd z0.d, p0/m, z0.d, z1.d
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
	%s1s = zext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
	%m = add nuw nsw <vscale x 2 x i64> %s0s, %s1s
	%s = lshr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
	%s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
	ret <vscale x 2 x i32> %s2
	}

	define <vscale x 4 x i32> @hadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
	; CHECK-LABEL: hadds_v4i32:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: shadd z0.s, p0/m, z0.s, z1.s
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
	%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
	%m = add nsw <vscale x 4 x i64> %s0s, %s1s
	%s = lshr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
	ret <vscale x 4 x i32> %s2
	}

	define <vscale x 4 x i32> @haddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
	; CHECK-LABEL: haddu_v4i32:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: uhadd z0.s, p0/m, z0.s, z1.s
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
	%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
	%m = add nuw nsw <vscale x 4 x i64> %s0s, %s1s
	%s = lshr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
	ret <vscale x 4 x i32> %s2
	}

	define <vscale x 2 x i16> @hadds_v2i16(<vscale x 2 x i16> %s0, <vscale x 2 x i16> %s1) {
	; CHECK-LABEL: hadds_v2i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: sxth z0.d, p0/m, z0.d
	; CHECK-NEXT: sxth z1.d, p0/m, z1.d
	; CHECK-NEXT: add z0.d, z0.d, z1.d
	; CHECK-NEXT: and z0.d, z0.d, #0xffffffff
	; CHECK-NEXT: lsr z0.d, z0.d, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 2 x i16> %s0 to <vscale x 2 x i32>
	%s1s = sext <vscale x 2 x i16> %s1 to <vscale x 2 x i32>
	%m = add nsw <vscale x 2 x i32> %s0s, %s1s
	%s = lshr <vscale x 2 x i32> %m, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
	%s2 = trunc <vscale x 2 x i32> %s to <vscale x 2 x i16>
	ret <vscale x 2 x i16> %s2
	}

	define <vscale x 2 x i16> @haddu_v2i16(<vscale x 2 x i16> %s0, <vscale x 2 x i16> %s1) {
	; CHECK-LABEL: haddu_v2i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: and z0.d, z0.d, #0xffff
	; CHECK-NEXT: and z1.d, z1.d, #0xffff
	; CHECK-NEXT: uhadd z0.d, p0/m, z0.d, z1.d
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 2 x i16> %s0 to <vscale x 2 x i32>
	%s1s = zext <vscale x 2 x i16> %s1 to <vscale x 2 x i32>
	%m = add nuw nsw <vscale x 2 x i32> %s0s, %s1s
	%s = lshr <vscale x 2 x i32> %m, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
	%s2 = trunc <vscale x 2 x i32> %s to <vscale x 2 x i16>
	ret <vscale x 2 x i16> %s2
	}

	define <vscale x 4 x i16> @hadds_v4i16(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {
	; CHECK-LABEL: hadds_v4i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: sxth z0.s, p0/m, z0.s
	; CHECK-NEXT: sxth z1.s, p0/m, z1.s
	; CHECK-NEXT: add z0.s, z0.s, z1.s
	; CHECK-NEXT: lsr z0.s, z0.s, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
	%s1s = sext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
	%m = add nsw <vscale x 4 x i32> %s0s, %s1s
	%s = lshr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	%s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
	ret <vscale x 4 x i16> %s2
	}

	define <vscale x 4 x i16> @haddu_v4i16(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {
	; CHECK-LABEL: haddu_v4i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: and z0.s, z0.s, #0xffff
	; CHECK-NEXT: and z1.s, z1.s, #0xffff
	; CHECK-NEXT: uhadd z0.s, p0/m, z0.s, z1.s
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
	%s1s = zext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
	%m = add nuw nsw <vscale x 4 x i32> %s0s, %s1s
	%s = lshr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	%s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
	ret <vscale x 4 x i16> %s2
	}

	define <vscale x 8 x i16> @hadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
	; CHECK-LABEL: hadds_v8i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: shadd z0.h, p0/m, z0.h, z1.h
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
	%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
	%m = add nsw <vscale x 8 x i32> %s0s, %s1s
	%s = lshr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
	%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
	ret <vscale x 8 x i16> %s2
	}

	define <vscale x 8 x i16> @haddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
	; CHECK-LABEL: haddu_v8i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: uhadd z0.h, p0/m, z0.h, z1.h
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
	%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
	%m = add nuw nsw <vscale x 8 x i32> %s0s, %s1s
	%s = lshr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
	%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
	ret <vscale x 8 x i16> %s2
	}

	define <vscale x 4 x i8> @hadds_v4i8(<vscale x 4 x i8> %s0, <vscale x 4 x i8> %s1) {
	; CHECK-LABEL: hadds_v4i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: sxtb z0.s, p0/m, z0.s
	; CHECK-NEXT: sxtb z1.s, p0/m, z1.s
	; CHECK-NEXT: add z0.s, z0.s, z1.s
	; CHECK-NEXT: and z0.s, z0.s, #0xffff
	; CHECK-NEXT: lsr z0.s, z0.s, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 4 x i8> %s0 to <vscale x 4 x i16>
	%s1s = sext <vscale x 4 x i8> %s1 to <vscale x 4 x i16>
	%m = add nsw <vscale x 4 x i16> %s0s, %s1s
	%s = lshr <vscale x 4 x i16> %m, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
	%s2 = trunc <vscale x 4 x i16> %s to <vscale x 4 x i8>
	ret <vscale x 4 x i8> %s2
	}

	define <vscale x 4 x i8> @haddu_v4i8(<vscale x 4 x i8> %s0, <vscale x 4 x i8> %s1) {
	; CHECK-LABEL: haddu_v4i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: and z0.s, z0.s, #0xff
	; CHECK-NEXT: and z1.s, z1.s, #0xff
	; CHECK-NEXT: uhadd z0.s, p0/m, z0.s, z1.s
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 4 x i8> %s0 to <vscale x 4 x i16>
	%s1s = zext <vscale x 4 x i8> %s1 to <vscale x 4 x i16>
	%m = add nuw nsw <vscale x 4 x i16> %s0s, %s1s
	%s = lshr <vscale x 4 x i16> %m, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
	%s2 = trunc <vscale x 4 x i16> %s to <vscale x 4 x i8>
	ret <vscale x 4 x i8> %s2
	}

	define <vscale x 8 x i8> @hadds_v8i8(<vscale x 8 x i8> %s0, <vscale x 8 x i8> %s1) {
	; CHECK-LABEL: hadds_v8i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: sxtb z0.h, p0/m, z0.h
	; CHECK-NEXT: sxtb z1.h, p0/m, z1.h
	; CHECK-NEXT: add z0.h, z0.h, z1.h
	; CHECK-NEXT: lsr z0.h, z0.h, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 8 x i8> %s0 to <vscale x 8 x i16>
	%s1s = sext <vscale x 8 x i8> %s1 to <vscale x 8 x i16>
	%m = add nsw <vscale x 8 x i16> %s0s, %s1s
	%s = lshr <vscale x 8 x i16> %m, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
	%s2 = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
	ret <vscale x 8 x i8> %s2
	}

	define <vscale x 8 x i8> @haddu_v8i8(<vscale x 8 x i8> %s0, <vscale x 8 x i8> %s1) {
	; CHECK-LABEL: haddu_v8i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: and z0.h, z0.h, #0xff
	; CHECK-NEXT: and z1.h, z1.h, #0xff
	; CHECK-NEXT: uhadd z0.h, p0/m, z0.h, z1.h
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 8 x i8> %s0 to <vscale x 8 x i16>
	%s1s = zext <vscale x 8 x i8> %s1 to <vscale x 8 x i16>
	%m = add nuw nsw <vscale x 8 x i16> %s0s, %s1s
	%s = lshr <vscale x 8 x i16> %m, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
	%s2 = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
	ret <vscale x 8 x i8> %s2
	}

	define <vscale x 16 x i8> @hadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
	; CHECK-LABEL: hadds_v16i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.b
	; CHECK-NEXT: shadd z0.b, p0/m, z0.b, z1.b
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
	%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
	%m = add nsw <vscale x 16 x i16> %s0s, %s1s
	%s = lshr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
	%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
	ret <vscale x 16 x i8> %s2
	}

	define <vscale x 16 x i8> @haddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
	; CHECK-LABEL: haddu_v16i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.b
	; CHECK-NEXT: uhadd z0.b, p0/m, z0.b, z1.b
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
	%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
	%m = add nuw nsw <vscale x 16 x i16> %s0s, %s1s
	%s = lshr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
	%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
	ret <vscale x 16 x i8> %s2
	}

	define <vscale x 2 x i64> @rhadds_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
	; CHECK-LABEL: rhadds_v2i64:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: srhadd z0.d, p0/m, z0.d, z1.d
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
	%s1s = sext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
	%add = add <vscale x 2 x i128> %s0s, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
	%add2 = add <vscale x 2 x i128> %add, %s1s
	%s = lshr <vscale x 2 x i128> %add2, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
	%result = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
	ret <vscale x 2 x i64> %result
	}

	define <vscale x 2 x i64> @rhaddu_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
	; CHECK-LABEL: rhaddu_v2i64:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: urhadd z0.d, p0/m, z0.d, z1.d
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
	%s1s = zext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
	%add = add nuw nsw <vscale x 2 x i128> %s0s, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
	%add2 = add nuw nsw <vscale x 2 x i128> %add, %s1s
	%s = lshr <vscale x 2 x i128> %add2, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
	%result = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
	ret <vscale x 2 x i64> %result
	}

	define <vscale x 2 x i32> @rhadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
	; CHECK-LABEL: rhadds_v2i32:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: sxtw z0.d, p0/m, z0.d
	; CHECK-NEXT: sxtw z1.d, p0/m, z1.d
	; CHECK-NEXT: eor z0.d, z0.d, z2.d
	; CHECK-NEXT: sub z0.d, z1.d, z0.d
	; CHECK-NEXT: lsr z0.d, z0.d, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
	%s1s = sext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
	%add = add <vscale x 2 x i64> %s0s, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
	%add2 = add <vscale x 2 x i64> %add, %s1s
	%s = lshr <vscale x 2 x i64> %add2, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
	%result = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
	ret <vscale x 2 x i32> %result
	}

	define <vscale x 2 x i32> @rhaddu_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
	; CHECK-LABEL: rhaddu_v2i32:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: and z0.d, z0.d, #0xffffffff
	; CHECK-NEXT: and z1.d, z1.d, #0xffffffff
	; CHECK-NEXT: urhadd z0.d, p0/m, z0.d, z1.d
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 2 x i32> %s0 to <vscale x 2 x i64>
	%s1s = zext <vscale x 2 x i32> %s1 to <vscale x 2 x i64>
	%add = add nuw nsw <vscale x 2 x i64> %s0s, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
	%add2 = add nuw nsw <vscale x 2 x i64> %add, %s1s
	%s = lshr <vscale x 2 x i64> %add2, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
	%result = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
	ret <vscale x 2 x i32> %result
	}

	define <vscale x 4 x i32> @rhadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
	; CHECK-LABEL: rhadds_v4i32:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: srhadd z0.s, p0/m, z0.s, z1.s
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
	%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
	%add = add <vscale x 4 x i64> %s0s, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	%add2 = add <vscale x 4 x i64> %add, %s1s
	%s = lshr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
	ret <vscale x 4 x i32> %result
	}

	define <vscale x 4 x i32> @rhaddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
	; CHECK-LABEL: rhaddu_v4i32:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: urhadd z0.s, p0/m, z0.s, z1.s
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
	%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
	%add = add nuw nsw <vscale x 4 x i64> %s0s, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	%add2 = add nuw nsw <vscale x 4 x i64> %add, %s1s
	%s = lshr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
	ret <vscale x 4 x i32> %result
	}

	define <vscale x 2 x i16> @rhadds_v2i16(<vscale x 2 x i16> %s0, <vscale x 2 x i16> %s1) {
	; CHECK-LABEL: rhadds_v2i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: sxth z0.d, p0/m, z0.d
	; CHECK-NEXT: sxth z1.d, p0/m, z1.d
	; CHECK-NEXT: eor z0.d, z0.d, z2.d
	; CHECK-NEXT: sub z0.d, z1.d, z0.d
	; CHECK-NEXT: and z0.d, z0.d, #0xffffffff
	; CHECK-NEXT: lsr z0.d, z0.d, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 2 x i16> %s0 to <vscale x 2 x i32>
	%s1s = sext <vscale x 2 x i16> %s1 to <vscale x 2 x i32>
	%add = add <vscale x 2 x i32> %s0s, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
	%add2 = add <vscale x 2 x i32> %add, %s1s
	%s = lshr <vscale x 2 x i32> %add2, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
	%result = trunc <vscale x 2 x i32> %s to <vscale x 2 x i16>
	ret <vscale x 2 x i16> %result
	}

	define <vscale x 2 x i16> @rhaddu_v2i16(<vscale x 2 x i16> %s0, <vscale x 2 x i16> %s1) {
	; CHECK-LABEL: rhaddu_v2i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: and z0.d, z0.d, #0xffff
	; CHECK-NEXT: and z1.d, z1.d, #0xffff
	; CHECK-NEXT: eor z0.d, z0.d, z2.d
	; CHECK-NEXT: sub z0.d, z1.d, z0.d
	; CHECK-NEXT: lsr z0.d, z0.d, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 2 x i16> %s0 to <vscale x 2 x i32>
	%s1s = zext <vscale x 2 x i16> %s1 to <vscale x 2 x i32>
	%add = add nuw nsw <vscale x 2 x i32> %s0s, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
	%add2 = add nuw nsw <vscale x 2 x i32> %add, %s1s
	%s = lshr <vscale x 2 x i32> %add2, shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
	%result = trunc <vscale x 2 x i32> %s to <vscale x 2 x i16>
	ret <vscale x 2 x i16> %result
	}

	define <vscale x 4 x i16> @rhadds_v4i16(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {
	; CHECK-LABEL: rhadds_v4i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: mov z2.s, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: sxth z0.s, p0/m, z0.s
	; CHECK-NEXT: sxth z1.s, p0/m, z1.s
	; CHECK-NEXT: eor z0.d, z0.d, z2.d
	; CHECK-NEXT: sub z0.s, z1.s, z0.s
	; CHECK-NEXT: lsr z0.s, z0.s, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
	%s1s = sext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
	%add = add <vscale x 4 x i32> %s0s, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	%add2 = add <vscale x 4 x i32> %add, %s1s
	%s = lshr <vscale x 4 x i32> %add2, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	%result = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
	ret <vscale x 4 x i16> %result
	}

	define <vscale x 4 x i16> @rhaddu_v4i16(<vscale x 4 x i16> %s0, <vscale x 4 x i16> %s1) {
	; CHECK-LABEL: rhaddu_v4i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: and z0.s, z0.s, #0xffff
	; CHECK-NEXT: and z1.s, z1.s, #0xffff
	; CHECK-NEXT: urhadd z0.s, p0/m, z0.s, z1.s
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 4 x i16> %s0 to <vscale x 4 x i32>
	%s1s = zext <vscale x 4 x i16> %s1 to <vscale x 4 x i32>
	%add = add nuw nsw <vscale x 4 x i32> %s0s, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	%add2 = add nuw nsw <vscale x 4 x i32> %add, %s1s
	%s = lshr <vscale x 4 x i32> %add2, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	%result = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
	ret <vscale x 4 x i16> %result
	}

	define <vscale x 8 x i16> @rhadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
	; CHECK-LABEL: rhadds_v8i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: srhadd z0.h, p0/m, z0.h, z1.h
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
	%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
	%add = add <vscale x 8 x i32> %s0s, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
	%add2 = add <vscale x 8 x i32> %add, %s1s
	%s = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
	%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
	ret <vscale x 8 x i16> %result
	}

	define <vscale x 8 x i16> @rhaddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
	; CHECK-LABEL: rhaddu_v8i16:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: urhadd z0.h, p0/m, z0.h, z1.h
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
	%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
	%add = add nuw nsw <vscale x 8 x i32> %s0s, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
	%add2 = add nuw nsw <vscale x 8 x i32> %add, %s1s
	%s = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
	%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
	ret <vscale x 8 x i16> %result
	}

	define <vscale x 4 x i8> @rhadds_v4i8(<vscale x 4 x i8> %s0, <vscale x 4 x i8> %s1) {
	; CHECK-LABEL: rhadds_v4i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: mov z2.s, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: sxtb z0.s, p0/m, z0.s
	; CHECK-NEXT: sxtb z1.s, p0/m, z1.s
	; CHECK-NEXT: eor z0.d, z0.d, z2.d
	; CHECK-NEXT: sub z0.s, z1.s, z0.s
	; CHECK-NEXT: and z0.s, z0.s, #0xffff
	; CHECK-NEXT: lsr z0.s, z0.s, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 4 x i8> %s0 to <vscale x 4 x i16>
	%s1s = sext <vscale x 4 x i8> %s1 to <vscale x 4 x i16>
	%add = add <vscale x 4 x i16> %s0s, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
	%add2 = add <vscale x 4 x i16> %add, %s1s
	%s = lshr <vscale x 4 x i16> %add2, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
	%result = trunc <vscale x 4 x i16> %s to <vscale x 4 x i8>
	ret <vscale x 4 x i8> %result
	}

	define <vscale x 4 x i8> @rhaddu_v4i8(<vscale x 4 x i8> %s0, <vscale x 4 x i8> %s1) {
	; CHECK-LABEL: rhaddu_v4i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: mov z2.s, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: and z0.s, z0.s, #0xff
	; CHECK-NEXT: and z1.s, z1.s, #0xff
	; CHECK-NEXT: eor z0.d, z0.d, z2.d
	; CHECK-NEXT: sub z0.s, z1.s, z0.s
	; CHECK-NEXT: lsr z0.s, z0.s, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 4 x i8> %s0 to <vscale x 4 x i16>
	%s1s = zext <vscale x 4 x i8> %s1 to <vscale x 4 x i16>
	%add = add nuw nsw <vscale x 4 x i16> %s0s, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
	%add2 = add nuw nsw <vscale x 4 x i16> %add, %s1s
	%s = lshr <vscale x 4 x i16> %add2, shufflevector (<vscale x 4 x i16> insertelement (<vscale x 4 x i16> poison, i16 1, i32 0), <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer)
	%result = trunc <vscale x 4 x i16> %s to <vscale x 4 x i8>
	ret <vscale x 4 x i8> %result
	}

	define <vscale x 8 x i8> @rhadds_v8i8(<vscale x 8 x i8> %s0, <vscale x 8 x i8> %s1) {
	; CHECK-LABEL: rhadds_v8i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: mov z2.h, #-1 // =0xffffffffffffffff
	; CHECK-NEXT: sxtb z0.h, p0/m, z0.h
	; CHECK-NEXT: sxtb z1.h, p0/m, z1.h
	; CHECK-NEXT: eor z0.d, z0.d, z2.d
	; CHECK-NEXT: sub z0.h, z1.h, z0.h
	; CHECK-NEXT: lsr z0.h, z0.h, #1
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 8 x i8> %s0 to <vscale x 8 x i16>
	%s1s = sext <vscale x 8 x i8> %s1 to <vscale x 8 x i16>
	%add = add <vscale x 8 x i16> %s0s, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
	%add2 = add <vscale x 8 x i16> %add, %s1s
	%s = lshr <vscale x 8 x i16> %add2, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
	%result = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
	ret <vscale x 8 x i8> %result
	}

	define <vscale x 8 x i8> @rhaddu_v8i8(<vscale x 8 x i8> %s0, <vscale x 8 x i8> %s1) {
	; CHECK-LABEL: rhaddu_v8i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.h
	; CHECK-NEXT: and z0.h, z0.h, #0xff
	; CHECK-NEXT: and z1.h, z1.h, #0xff
	; CHECK-NEXT: urhadd z0.h, p0/m, z0.h, z1.h
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 8 x i8> %s0 to <vscale x 8 x i16>
	%s1s = zext <vscale x 8 x i8> %s1 to <vscale x 8 x i16>
	%add = add nuw nsw <vscale x 8 x i16> %s0s, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
	%add2 = add nuw nsw <vscale x 8 x i16> %add, %s1s
	%s = lshr <vscale x 8 x i16> %add2, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
	%result = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
	ret <vscale x 8 x i8> %result
	}

	define <vscale x 16 x i8> @rhadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
	; CHECK-LABEL: rhadds_v16i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.b
	; CHECK-NEXT: srhadd z0.b, p0/m, z0.b, z1.b
	; CHECK-NEXT: ret
	entry:
	%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
	%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
	%add = add <vscale x 16 x i16> %s0s, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
	%add2 = add <vscale x 16 x i16> %add, %s1s
	%s = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
	%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
	ret <vscale x 16 x i8> %result
	}

	define <vscale x 16 x i8> @rhaddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
	; CHECK-LABEL: rhaddu_v16i8:
	; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: ptrue p0.b
	; CHECK-NEXT: urhadd z0.b, p0/m, z0.b, z1.b
	; CHECK-NEXT: ret
	entry:
	%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
	%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
	%add = add nuw nsw <vscale x 16 x i16> %s0s, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
	%add2 = add nuw nsw <vscale x 16 x i16> %add, %s1s
	%s = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
	%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
	ret <vscale x 16 x i8> %result
	}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE]: custom lower AVGFloor/AVGCeil.ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 503500

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-hadd.ll

llvm/test/CodeGen/AArch64/sve2-hadd.ll

[AArch64][SVE]: custom lower AVGFloor/AVGCeil.
ClosedPublic