This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
25/27
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
aarch64-dup-ext-scalable.ll
1/1
aarch64-dup-ext.ll

Differential D91255

[AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))
ClosedPublic

Authored by NickGuy on Nov 11 2020, 6:22 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen
fhahn
efriedma

Commits

rG350247a93c07: [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))

Summary

Performing this rearrangement allows for existing patterns
to match against cases where operands are hoisted up to a loop
header (by LICM), allowing for optimisations such as umull/smull
codegen, while still allowing loop-invariant values to be hoisted
out of the loop afer other optimisations.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

NickGuy created this revision.Nov 11 2020, 6:22 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptNov 11 2020, 6:22 AM

NickGuy requested review of this revision.Nov 11 2020, 6:22 AM

NickGuy added inline comments.Nov 11 2020, 6:27 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10584–10586	In a number of tests, this line is hit more than a few times. Only 1 of which it fails on. Using .dump() to try and identify why didn't help, as it appears to be the same style as the others that pass (e.g. %tmp3 = sext <8 x i8> %arg to <8 x i16>) If anyone can provide insight to this, I would greatly appreciate it :)
15020–15029	I'm unsure as to when PerformDAGCombine is invoked. If this function generates a new DUP node, would this function then be invoked with that node? Or does this function need a bit more scaffolding to support this case?

SjoerdMeijer added inline comments.Nov 11 2020, 6:59 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10584–10586	what do you mean by fail? I guess that's a segfault? I guess you need to make sure it is an instruction first with dyn_cast, then you check its operands and uses.

NickGuy added inline comments.Nov 11 2020, 7:05 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10584–10586	Sorry, I should've been a bit clearer. It fails with an assertion error that stems from an invalid cast in CodeGenPrepare::tryToSinkFreeOperands. Shuffle->getOperandUse(0).dump() outputs something that looks like an instruction, but reportedly isn't, and only in a single case. It's simple to work around, but I was curious as to if anyone had an idea as to what was happening.

Harbormaster completed remote builds in B78452: Diff 304502.Nov 11 2020, 7:16 AM

SjoerdMeijer added inline comments.Nov 11 2020, 7:28 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10577	Nit: do you need the enumerate? Can you do just: for (auto *OpIdx : I->operands())
10584–10586	Still a bit unclear to me. But are you actually checking Shuffle is an instruction? I also don't see what the workaround is.
14986	Nit: A comment on what we are exactly combining here.
14987	Nit: performDUPSextCombine -> performDUPSExtCombine
14988	Nit: think the coding style is to use: auto *Operand = ... when it is a pointer. Same for more declarations below.
14994	Can you run clang-format on your patch? I've tried reading this function, but all these lint message makes it pretty unreadable to me.

Hello

This looks like two separate patches to me. One that folds dup(ext(..)) into ext(dup(..)), and another that tries to sink operands into a loop.

They should be separate patches with their own set of tests. The first, done universally, will need some thought (and maybe some benchmarks) to make sure it's always the correct thing to do, it's not just something that happens to work here. The fact that no other tests are failing is a good sign at least.

This looks like two separate patches to me.

I've pulled the operand sinking out to it's own patch now.

The first, done universally, will need some thought (and maybe some benchmarks) to make sure it's always the correct thing to do, it's not just something that happens to work here

Agreed, this review request was simply to gather some more opinions on the matter. There was no intention of merging this as-is.

The fact that no other tests are failing is a good sign at least.

Now that you mention it, it's a bit concerning. It's possible that there are no existing tests covering this case.

NickGuy added a child revision: D91271: [AArch64] Attempt to sink mul operands.Nov 11 2020, 8:48 AM

NickGuy added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10584–10586	Shuffle is always an instruction at this point, the part that's causing the assertion failure is pushing the operand use into Ops. The result of Shuffle->getOperandUse(0) is what is sometimes not an instruction. The workaround is the isa<Instruction>(&Shuffle->getOperandUse(0) check.
14987	I opted for performDUPExtCombine, as it performs both signed and zero extends
14988	None of these are pointer types, though that's good to know for future :)
14994	Done, sorry about that

Harbormaster completed remote builds in B78468: Diff 304529.Nov 11 2020, 9:22 AM

Thanks

So what this seems to be proposing is that ever instance of dup(sext(..)) is transformed into sext(dup(..)). That's a pretty general transform. Off the top of my head... the extend could be free in places (zext i32->i64) but it may allow more folding into other instructions like the smull/saddl/ssubl it can allow here. If the operand is a load, that should at least have been folded into a single instruction already prior to the dup being made. But I would probably expect a sxth+dup to be quicker than a dup+sshll in general.

I would suggest trying this on a number of small tests and seeing how it does in terms of instruction count. Alternatively try compiling the llvm testsuite and seeing how many things change, and if they look better or worse in the process. But like I said this may need to be a bit more targeted.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14973	The llvm style guide suggests not to over-use auto like this. It just makes it more difficult telling what things are.
14979	These is no such thing as a "non-vector dup", as far as I understand.
14998	Create a `SDLoc DL(N)` and use it in both getNodes. This doesn't need getVTList for single types. Or {} for the operands I don't think.
15012	Detecting the extend is probably best done inside the function. It's common to do: if (SDValue Ext = performDUPExtCombine(..)) return Ext; I would personally leave out at least AssertSext and AssertZext until you at least have a test that shows them being needed. If this does SIGN_EXTEND_INREG it should probably handle the equivalent AND as well.
llvm/test/CodeGen/AArch64/aarch64-matrix-smull.ll
4 ↗	(On Diff #304529)	I would expect tests that looked something like (but I got and edited this one from mve): define <4 x i32> @vdup_i16(i16 %src) { ; CHECK-LABEL: vdup_i16: ; CHECK: @ %bb.0: @ %entry ; CHECK-NEXT: vdup.16 q0, r0 ; CHECK-NEXT: bx lr entry: %0 = insertelement <4 x i16> undef, i16 %src, i32 0 %x = shufflevector <4 x i16> %0, <4 x i16> undef, <4 x i32> zeroinitializer %out = sext <4 x i16> %0 to <4 x i32> ret <4 x i32> %out } But for all type and sizes that this transform supports. Which seems to be a lot at the moment. Don't forget scalable types too. Having tests that show that mul(sext(...), dup(sext(...))) are also folded sounds useful too, but they can hopefully be equally small.

dmgreen added inline comments.Nov 11 2020, 2:02 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14994	I'm not sure what this is doing. It should be sign extending from the smaller type with the correct number of vector lanes (either ExtOperand.getValueType() for a sext/zext or Operand.getOperand(1) for a SIGN_EXTEND_INREG, and something based on the mask for an AND). It them probably has to make sure that the new extend is a legal operation.

Addressed comments, and changed to be more targeted. The rearrangement will only be performed if the pattern feeds into a mul instruction, and only if the type combination is valid for smull/umull folding.

llvm/test/CodeGen/AArch64/aarch64-matrix-smull.ll
4 ↗	(On Diff #304529)	I've added some tests testing the behaviour for different types and sizes (all generated, so the IR is identical apart from the types). I've omitted support for scalable types, as I encountered some issues when testing them. I plan to make progress with the fixed types first, then revisit scalable types later (Unless it turns out that to support them, I'm missing 1 line somewhere).

Harbormaster completed remote builds in B80816: Diff 308966.Dec 2 2020, 8:03 AM

Removed redundant/useless debug prints that were erroneously included in the patch.

Harbormaster completed remote builds in B80820: Diff 308975.Dec 2 2020, 8:29 AM

Sorry for the delay. There is a lot of code here all of a sudden. More than I expected!

Can you run clang-format on the patch to make it more readable?

Can you run clang-format on the patch to make it more readable?

Done, sorry about that (Guess I should have another look at my pre-commit hooks)

Harbormaster completed remote builds in B81263: Diff 309844.Dec 7 2020, 3:11 AM

Fixed more formatting issues (that were apparently missed by clang-format before?)

Harbormaster completed remote builds in B81296: Diff 309909.Dec 7 2020, 7:51 AM

Thanks, that makes it clearer at least.

There's quite a bit of code here. I was hoping this would be a lot simpler, and it would be good if some of this could be simplified (but a lot of this might well be needed for all the cases that are being supported?). It makes it more difficult to check all the things it's doing.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11565	There is a convenience function, DAG.getUNDEF(VT); Probably doesn't need a lambda as a result.
11597	Debug messages are fairly uncommon in ISel Lowering. It tends to trigger a lot of times where they are not important, and the debug messages it prints already are usually enough to give a rough indication of what is going on. This whole function probably simplifies a lot I think? Is it something like TargetType.getHalfNumVectorElementsVT == PreExtendVT and the TargetType is one of {...}? I'm not sure how the MVT::v16i8 extended to MVT::v8i16 works out, from ISel. I think because ISel is type checked, those cases would not come up even if the underlying instructions supported it? We are not optimizing smull2 as far as I understand.
11628	Do we need to check both shuffles and DUP's? It that for the i64 types that are otherwise illegal somehow?
11653	A mul should always be an integer I think?
11678–11679	This comment is very short (as in - the length of the line). It probably doesn't need the newline before it either.
11686	Multiplies (and other BinOps) only have 2 operands. It would probably be simpler to just check both the operands as opposed to needing the loop.
11695	This also seems to make changes without necessarily detecting that they are useful. That is probably fine here considering what is being transformed, but it can be better to represent it as "test if it will help, if so then make the change".

NickGuy updated this revision to Diff 311506.Dec 14 2020, 12:26 AM

NickGuy marked 5 inline comments as done.

NickGuy added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11565	Ah, thanks for pointing that out. Lambda is removed, too.
11597	Debug messages are fairly uncommon in ISel Lowering Removed the debug messages.
11628	No, we don't. Looks like checking for shuffles catches the same DUP cases before they become DUPs. I've removed the DUP-specific check, and have unified it down the VectorShuffle check
11653	That was a remnant of before we were restricting this to muls only. Regardless, this has now been removed.
11686	It would probably be simpler to just check both the operands as opposed to needing the loop. I disagree. Having the loop here makes it clearer that each operand is being handled in the same way, while having them separated needlessly duplicates the code.

Harbormaster completed remote builds in B82226: Diff 311506.Dec 14 2020, 12:36 AM

Fixed broken commit

Harbormaster completed remote builds in B82427: Diff 311840.Dec 15 2020, 1:37 AM

Fixed formatting, and removed "Unsupported combines" tests as they were failing due to a separate issue (and contribute very little value to this patch)

Harbormaster completed remote builds in B82433: Diff 311853.Dec 15 2020, 3:09 AM

In D91255#2454474, @NickGuy wrote:

Fixed formatting, and removed "Unsupported combines" tests as they were failing due to a separate issue (and contribute very little value to this patch)

It might be worth keeping one or two to show it doesn't do anything wrong/crash on incorrect types.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11573	Add a message explaining what this function does. It looks generally useful. It might also be worth having this function return the PreExtendType instead, and letting the parent function do the extend to vector type. Either way is fine but it might allow the function to be used for other things, if they come up.
11597	Can remove debug here and below.
11619	Nit: Doesn't need a break like this after a return.
11622	I think this can be PostExtendType.changeVectorElementType(PreExtendType);
11628	SDNode -> SDValue
11637	I may be wrong, but is the `PreExtendVT != MVT::v16i8` half of this ever used? TargetType and PreExtendVT should have the same number of vector elements I believe. And then `TargetType.getScalarSizeInBits() == 2 * PreExtendVT.getScalarSizeInBits()` maybe? That might help this lambda simplify further, possibly to the point that it is easier to inline it, now that it is only used in one place.
11670	Nit: DebugLoc is usually called just DL (I think DebugLoc is already the name of the type used in Machine Instructions).
11674	Nit: It doesn't need the {} around the operands, there are overloaded methods to handle that automatically already.
11681	It may be simpler to generate the AArch64ISD::DUP directly? I'm not sure either way which is better, but it's less nodes and we know it's going to get there eventually. Be careful about illegal types though.
11686	It would seem simpler (and smaller) to just call performCommonVectorExtendCombine for each operand. It may need to use something like `Op0 ? Op0 : Mul->getOperand(0)`, but it would remove the need to track Changed and most of the rest of this loop.

Addressed comments, and re-added a couple of "Unsupported combines" tests.

Harbormaster completed remote builds in B82640: Diff 312207.Dec 16 2020, 8:15 AM

Thanks for the updates, this is looking good.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

11585

SDNode* -> SDValue

11691–11695

I think this will always produce a new value, as opposed to returning SDValue when nothing is done. It's probably best to make it something like this instead:

SDValue Op0 = performCommonVectorExtendCombine(Mul->getOperand(0), DAG);
SDValue Op1 = performCommonVectorExtendCombine(Mul->getOperand(1), DAG);
if (!Op0 && !Op1)
  return SDValue();

SDLoc DL(Mul);
return DAG.getNode(Mul->getOpcode(), DL, Mul->getValueType(0),
                   Op0 ? Op0 : Mul->getOperand(0),
                   Op1 ? Op1 : Mul->getOperand(1));

Addressed Comments

NickGuy added inline comments.Dec 18 2020, 3:02 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11637	Not entirely sure it is, I added that check to explicitly allow the types accepted by u/smull. I've changed this now.
11681	Generating the DUP directly, while simpler at this point, failed due to this combine being performed fairly early, and the DUP not being handled as part of lowering. Given that, keeping it as shuffle_vector at this point seems simpler, as we can also benefit from the existing VECTOR_SHUFFLE->DUP lowering checks.

Harbormaster completed remote builds in B82929: Diff 312736.Dec 18 2020, 4:13 AM

Fixed incorrect error-handling

Harbormaster completed remote builds in B82955: Diff 312773.Dec 18 2020, 6:27 AM

dmgreen added inline comments.Dec 21 2020, 12:32 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11586	const SDValue &ExtOperand -> SDValue ExtOperand Its also only used in one place and needn't get a variable.
11595	const SDValue &TypeOperand -> SDValue TypeOperand
11633	This doesn't seem to be checking that the shuffle/insert are actually a splat. There is an isSplatValue method that could help, depending on if it's really checking what this wants to check.

NickGuy updated this revision to Diff 314185.Dec 31 2020, 5:01 AM

NickGuy marked 3 inline comments as done.

NickGuy added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11633	Done, though I wasn't able to produce IR that represented a shuffle/insert that wasn't a splat, so I haven't got any tests for this case.

Harbormaster completed remote builds in B83784: Diff 314185.Dec 31 2020, 5:48 AM

Added test case for non-splat shuffles

Harbormaster completed remote builds in B84013: Diff 314550.Jan 5 2021, 3:38 AM

dmgreen added inline comments.Jan 5 2021, 8:34 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11717	Although (I know..) I suggested this, it does more than what we might expect here. And there's the chance it could change in any number of weird and wonderful ways in the future. As we are relying on this being a vector shuffle with an insert element and a zero mask, I think we should check for that. I think it should be enough to test that: The shuffle vector has a zero mask (ShuffleVectorSDNode(VectorShuffle)->isSplat(), getSplatIndex()==0) The first first operand is an insert That is inserting into lane 0.
llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
168	I think this could still count as a splat, as the elements are undef (can happily take any value, including 0). It's probably fine to use a extra parameter and something like an ext shuffle: %broadcast.splat = shufflevector <8 x i16> %c, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1> That should hopefully test things like it not being a splat and the insert not existing.

Addressed comments, updating the checks and tests

Harbormaster completed remote builds in B84188: Diff 314861.Jan 6 2021, 5:39 AM

Thanks. LGTM with a couple of suggestions.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11714	You can probably remove this now and just rely on the dyn_cast<ShuffleVectorSDNode> check below.
11724	Maybe move this comment up above the if, to avoid the misleading indentation.
11733–11739	Maybe something like this? // Ensures the insert is inserting into lane 0 auto *Constant = dyn_cast<ConstantSDNode>(InsertLane.getNode()); if (!Constant \|\| Constant->getZExtValue() != 0) return SDValue();

This revision is now accepted and ready to land.Jan 6 2021, 5:44 AM

Closed by commit rG350247a93c07: [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup)) (authored by NickGuy). · Explain WhyJan 6 2021, 8:10 AM

This revision was automatically updated to reflect the committed changes.

NickGuy marked 3 inline comments as done.

NickGuy added a commit: rG350247a93c07: [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup)).

NickGuy mentioned this in D91271: [AArch64] Attempt to sink mul operands.Jan 7 2021, 2:52 AM

This commit caused a crash on our bots with a null LLVMTy at changeExtendedVectorElementType. Reduced case: https://godbolt.org/z/7ehffx

In D91255#2484399, @dmajor wrote:

This commit caused a crash on our bots with a null LLVMTy at changeExtendedVectorElementType. Reduced case: https://godbolt.org/z/7ehffx

Thanks for bringing this to my attention, I've got a fix in review at D94234.

NickGuy mentioned this in rGed23229a64ae: [AArch64] Fix crash caused by invalid vector element type.Jan 8 2021, 4:03 AM

NickGuy mentioned this in rGdda60035e9f0: [AArch64] Attempt to sink mul operands.Jan 13 2021, 7:23 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

130 lines

test/

CodeGen/

AArch64/

aarch64-dup-ext-scalable.ll

327 lines

aarch64-dup-ext.ll

171 lines

Diff 314550

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Context not available.
	return false;	return false;
	}	}

		/// Calculates what the pre-extend type is, based on the extension
		/// operation node provided by \p Extend.
		///
		/// In the case that \p Extend is a SIGN_EXTEND or a ZERO_EXTEND, the
		/// pre-extend type is pulled directly from the operand, while other extend
		/// operations need a bit more inspection to get this information.
		///
		/// \param Extend The SDNode from the DAG that represents the extend operation
		/// \param DAG The SelectionDAG hosting the \p Extend node
		///
		/// \returns The type representing the \p Extend source type, or \p MVT::Other
		dmgreenUnsubmitted Done Reply Inline Actions Nit: DebugLoc is usually called just DL (I think DebugLoc is already the name of the type used in Machine Instructions). dmgreen: Nit: DebugLoc is usually called just DL (I think DebugLoc is already the name of the type used…
		/// if no valid type can be determined
		static EVT calculatePreExtendType(SDValue Extend, SelectionDAG &DAG) {
		switch (Extend.getOpcode()) {
		case ISD::SIGN_EXTEND:
		dmgreenUnsubmitted Done Reply Inline Actions Nit: It doesn't need the {} around the operands, there are overloaded methods to handle that automatically already. dmgreen: Nit: It doesn't need the {} around the operands, there are overloaded methods to handle that…
		case ISD::ZERO_EXTEND:
		return Extend.getOperand(0).getValueType();
		case ISD::AssertSext:
		case ISD::AssertZext:
		case ISD::SIGN_EXTEND_INREG: {
		dmgreenUnsubmitted Done Reply Inline Actions This comment is very short (as in - the length of the line). It probably doesn't need the newline before it either. dmgreen: This comment is very short (as in - the length of the line). It probably doesn't need the…
		VTSDNode *TypeNode = dyn_cast<VTSDNode>(Extend.getOperand(1));
		if (!TypeNode)
		dmgreenUnsubmitted Done Reply Inline Actions It may be simpler to generate the AArch64ISD::DUP directly? I'm not sure either way which is better, but it's less nodes and we know it's going to get there eventually. Be careful about illegal types though. dmgreen: It may be simpler to generate the AArch64ISD::DUP directly? I'm not sure either way which is…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions Generating the DUP directly, while simpler at this point, failed due to this combine being performed fairly early, and the DUP not being handled as part of lowering. Given that, keeping it as shuffle_vector at this point seems simpler, as we can also benefit from the existing VECTOR_SHUFFLE->DUP lowering checks. NickGuy: Generating the DUP directly, while simpler at this point, failed due to this combine being…
		return MVT::Other;
		return TypeNode->getVT();
		}
		case ISD::AND: {
		ConstantSDNode *Constant =
		dmgreenUnsubmitted Done Reply Inline Actions Multiplies (and other BinOps) only have 2 operands. It would probably be simpler to just check both the operands as opposed to needing the loop. dmgreen: Multiplies (and other BinOps) only have 2 operands. It would probably be simpler to just check…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions It would probably be simpler to just check both the operands as opposed to needing the loop. I disagree. Having the loop here makes it clearer that each operand is being handled in the same way, while having them separated needlessly duplicates the code. NickGuy: > It would probably be simpler to just check both the operands as opposed to needing the loop.
		dmgreenUnsubmitted Done Reply Inline Actions It would seem simpler (and smaller) to just call performCommonVectorExtendCombine for each operand. It may need to use something like `Op0 ? Op0 : Mul->getOperand(0)`, but it would remove the need to track Changed and most of the rest of this loop. dmgreen: It would seem simpler (and smaller) to just call performCommonVectorExtendCombine for each…
		dyn_cast<ConstantSDNode>(Extend.getOperand(1).getNode());
		if (!Constant)
		return MVT::Other;

		uint32_t Mask = Constant->getZExtValue();

		if (Mask == UCHAR_MAX)
		return MVT::i8;
		else if (Mask == USHRT_MAX)
		dmgreenUnsubmitted Done Reply Inline Actions This also seems to make changes without necessarily detecting that they are useful. That is probably fine here considering what is being transformed, but it can be better to represent it as "test if it will help, if so then make the change". dmgreen: This also seems to make changes without necessarily detecting that they are useful. That is…
		dmgreenUnsubmitted Done Reply Inline Actions I think this will always produce a new value, as opposed to returning SDValue when nothing is done. It's probably best to make it something like this instead: SDValue Op0 = performCommonVectorExtendCombine(Mul->getOperand(0), DAG); SDValue Op1 = performCommonVectorExtendCombine(Mul->getOperand(1), DAG); if (!Op0 && !Op1) return SDValue(); SDLoc DL(Mul); return DAG.getNode(Mul->getOpcode(), DL, Mul->getValueType(0), Op0 ? Op0 : Mul->getOperand(0), Op1 ? Op1 : Mul->getOperand(1)); dmgreen: I think this will always produce a new value, as opposed to returning SDValue when nothing is…
		return MVT::i16;
		else if (Mask == UINT_MAX)
		return MVT::i32;

		return MVT::Other;
		}
		default:
		return MVT::Other;
		}

		llvm_unreachable("Code path unhandled in calculatePreExtendType!");
		}

		/// Combines a dup(sext/zext) node pattern into sext/zext(dup)
		/// making use of the vector SExt/ZExt rather than the scalar SExt/ZExt
		static SDValue performCommonVectorExtendCombine(SDValue VectorShuffle,
		SelectionDAG &DAG) {

		if (VectorShuffle.getOpcode() != ISD::VECTOR_SHUFFLE)
		dmgreenUnsubmitted Done Reply Inline Actions You can probably remove this now and just rely on the dyn_cast<ShuffleVectorSDNode> check below. dmgreen: You can probably remove this now and just rely on the dyn_cast<ShuffleVectorSDNode> check below.
		return SDValue();

		if (!DAG.isSplatValue(VectorShuffle))
		dmgreenUnsubmitted Done Reply Inline Actions Although (I know..) I suggested this, it does more than what we might expect here. And there's the chance it could change in any number of weird and wonderful ways in the future. As we are relying on this being a vector shuffle with an insert element and a zero mask, I think we should check for that. I think it should be enough to test that: The shuffle vector has a zero mask (ShuffleVectorSDNode(VectorShuffle)->isSplat(), getSplatIndex()==0) The first first operand is an insert That is inserting into lane 0. dmgreen: Although (I know..) I suggested this, it does more than what we might expect here. And there's…
		return SDValue();

		SDValue InsertVectorElt = VectorShuffle.getOperand(0);
		SDValue Extend = InsertVectorElt.getOperand(1);
		unsigned ExtendOpcode = Extend.getOpcode();

		bool IsSExt = ExtendOpcode == ISD::SIGN_EXTEND \|\|
		dmgreenUnsubmitted Done Reply Inline Actions Maybe move this comment up above the if, to avoid the misleading indentation. dmgreen: Maybe move this comment up above the if, to avoid the misleading indentation.
		ExtendOpcode == ISD::SIGN_EXTEND_INREG \|\|
		ExtendOpcode == ISD::AssertSext;
		if (!IsSExt && ExtendOpcode != ISD::ZERO_EXTEND &&
		ExtendOpcode != ISD::AssertZext && ExtendOpcode != ISD::AND)
		return SDValue();

		EVT TargetType = VectorShuffle.getValueType();
		EVT PreExtendType = calculatePreExtendType(Extend, DAG);

		if ((TargetType != MVT::v8i16 && TargetType != MVT::v4i32 &&
		TargetType != MVT::v2i64) \|\|
		(PreExtendType == MVT::Other))
		return SDValue();

		EVT PreExtendVT = TargetType.changeVectorElementType(PreExtendType);
		dmgreenUnsubmitted Done Reply Inline Actions Maybe something like this? // Ensures the insert is inserting into lane 0 auto Constant = dyn_cast<ConstantSDNode>(InsertLane.getNode()); if (!Constant \|\| Constant->getZExtValue() != 0) return SDValue(); dmgreen:* Maybe something like this? ``` // Ensures the insert is inserting into lane 0 auto…

		if (PreExtendVT.getVectorElementCount() != TargetType.getVectorElementCount())
		return SDValue();

		if (TargetType.getScalarSizeInBits() != PreExtendVT.getScalarSizeInBits() * 2)
		return SDValue();

		SDLoc DL(VectorShuffle);

		SDValue InsertVectorNode = DAG.getNode(
		InsertVectorElt.getOpcode(), DL, PreExtendVT, DAG.getUNDEF(PreExtendVT),
		Extend.getOperand(0), DAG.getConstant(0, DL, MVT::i64));

		std::vector<int> ShuffleMask(TargetType.getVectorElementCount().getValue());

		SDValue VectorShuffleNode =
		DAG.getVectorShuffle(PreExtendVT, DL, InsertVectorNode,
		DAG.getUNDEF(PreExtendVT), ShuffleMask);

		SDValue ExtendNode =
		DAG.getNode(IsSExt ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND, DL, TargetType,
		VectorShuffleNode, DAG.getValueType(TargetType));

		return ExtendNode;
		}

		/// Combines a mul(dup(sext/zext)) node pattern into mul(sext/zext(dup))
		/// making use of the vector SExt/ZExt rather than the scalar SExt/ZExt
		static SDValue performMulVectorExtendCombine(SDNode *Mul, SelectionDAG &DAG) {
		// If the value type isn't a vector, none of the operands are going to be dups
		if (!Mul->getValueType(0).isVector())
		return SDValue();

		SDValue Op0 = performCommonVectorExtendCombine(Mul->getOperand(0), DAG);
		SDValue Op1 = performCommonVectorExtendCombine(Mul->getOperand(1), DAG);

		// Neither operands have been changed, don't make any further changes
		if (!Op0 && !Op1)
		return SDValue();

		SDLoc DL(Mul);
		return DAG.getNode(Mul->getOpcode(), DL, Mul->getValueType(0),
		Op0 ? Op0 : Mul->getOperand(0),
		Op1 ? Op1 : Mul->getOperand(1));
		}

	static SDValue performMulCombine(SDNode *N, SelectionDAG &DAG,	static SDValue performMulCombine(SDNode *N, SelectionDAG &DAG,
	TargetLowering::DAGCombinerInfo &DCI,	TargetLowering::DAGCombinerInfo &DCI,
	const AArch64Subtarget *Subtarget) {	const AArch64Subtarget *Subtarget) {

		if (SDValue Ext = performMulVectorExtendCombine(N, DAG))
		return Ext;

	if (DCI.isBeforeLegalizeOps())	if (DCI.isBeforeLegalizeOps())
	return SDValue();	return SDValue();

Context not available.
		NickGuyAuthorUnsubmitted Done Reply Inline Actions I'm unsure as to when PerformDAGCombine is invoked. If this function generates a new DUP node, would this function then be invoked with that node? Or does this function need a bit more scaffolding to support this case? NickGuy: I'm unsure as to when PerformDAGCombine is invoked. If this function generates a new DUP node…
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: performDUPSextCombine -> performDUPSExtCombine SjoerdMeijer: Nit: performDUPSextCombine -> performDUPSExtCombine
		NickGuyAuthorUnsubmitted Done Reply Inline Actions I opted for performDUPExtCombine, as it performs both signed and zero extends NickGuy: I opted for performDUPExtCombine, as it performs both signed and zero extends
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: think the coding style is to use: auto Operand = ... when it is a pointer. Same for more declarations below. SjoerdMeijer:* Nit: think the coding style is to use: auto *Operand = ... when it is a pointer. Same for…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions None of these are pointer types, though that's good to know for future :) NickGuy: None of these are pointer types, though that's good to know for future :)
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: A comment on what we are exactly combining here. SjoerdMeijer: Nit: A comment on what we are exactly combining here.
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Can you run clang-format on your patch? I've tried reading this function, but all these lint message makes it pretty unreadable to me. SjoerdMeijer: Can you run clang-format on your patch? I've tried reading this function, but all these lint…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions Done, sorry about that NickGuy: Done, sorry about that
		dmgreenUnsubmitted Done Reply Inline Actions Detecting the extend is probably best done inside the function. It's common to do: if (SDValue Ext = performDUPExtCombine(..)) return Ext; I would personally leave out at least AssertSext and AssertZext until you at least have a test that shows them being needed. If this does SIGN_EXTEND_INREG it should probably handle the equivalent AND as well. dmgreen: Detecting the extend is probably best done inside the function. It's common to do: if…
		dmgreenUnsubmitted Done Reply Inline Actions The llvm style guide suggests not to over-use auto like this. It just makes it more difficult telling what things are. dmgreen: The llvm style guide suggests not to over-use auto like this. It just makes it more difficult…
		dmgreenUnsubmitted Done Reply Inline Actions These is no such thing as a "non-vector dup", as far as I understand. dmgreen: These is no such thing as a "non-vector dup", as far as I understand.
		dmgreenUnsubmitted Done Reply Inline Actions Create a `SDLoc DL(N)` and use it in both getNodes. This doesn't need getVTList for single types. Or {} for the operands I don't think. dmgreen: Create a `SDLoc DL(N)` and use it in both getNodes. This doesn't need getVTList for single…
		dmgreenUnsubmitted Done Reply Inline Actions I'm not sure what this is doing. It should be sign extending from the smaller type with the correct number of vector lanes (either ExtOperand.getValueType() for a sext/zext or Operand.getOperand(1) for a SIGN_EXTEND_INREG, and something based on the mask for an AND). It them probably has to make sure that the new extend is a legal operation. dmgreen: I'm not sure what this is doing. It should be sign extending from the smaller type with the…

llvm/test/CodeGen/AArch64/aarch64-dup-ext-scalable.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple aarch64-none-linux-gnu -mattr=+sve \| FileCheck %s

				define <vscale x 2 x i16> @dupsext_v2i8_v2i16(i8 %src, <vscale x 2 x i16> %b) {
				; CHECK-LABEL: dupsext_v2i8_v2i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 2 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 2 x i16> %broadcast.splatinsert, <vscale x 2 x i16> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i16> %broadcast.splat, %b
				ret <vscale x 2 x i16> %out
				}

				define <vscale x 4 x i16> @dupsext_v4i8_v4i16(i8 %src, <vscale x 4 x i16> %b) {
				; CHECK-LABEL: dupsext_v4i8_v4i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 4 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 4 x i16> %broadcast.splatinsert, <vscale x 4 x i16> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nsw <vscale x 4 x i16> %broadcast.splat, %b
				ret <vscale x 4 x i16> %out
				}

				define <vscale x 8 x i16> @dupsext_v8i8_v8i16(i8 %src, <vscale x 8 x i16> %b) {
				; CHECK-LABEL: dupsext_v8i8_v8i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: mov z1.h, w8
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 8 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 8 x i16> %broadcast.splatinsert, <vscale x 8 x i16> undef, <vscale x 8 x i32> zeroinitializer
				%out = mul nsw <vscale x 8 x i16> %broadcast.splat, %b
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 2 x i32> @dupsext_v2i8_v2i32(i8 %src, <vscale x 2 x i32> %b) {
				; CHECK-LABEL: dupsext_v2i8_v2i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 2 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 2 x i32> %broadcast.splatinsert, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i32> %broadcast.splat, %b
				ret <vscale x 2 x i32> %out
				}

				define <vscale x 4 x i32> @dupsext_v4i8_v4i32(i8 %src, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: dupsext_v4i8_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 4 x i32> %broadcast.splatinsert, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nsw <vscale x 4 x i32> %broadcast.splat, %b
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @dupsext_v2i8_v2i64(i8 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupsext_v2i8_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: sxtb x8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 2 x i32> @dupsext_v2i16_v2i32(i16 %src, <vscale x 2 x i32> %b) {
				; CHECK-LABEL: dupsext_v2i16_v2i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxth w8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i16 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 2 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 2 x i32> %broadcast.splatinsert, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i32> %broadcast.splat, %b
				ret <vscale x 2 x i32> %out
				}

				define <vscale x 4 x i32> @dupsext_v4i16_v4i32(i16 %src, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: dupsext_v4i16_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxth w8, w0
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = sext i16 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 4 x i32> %broadcast.splatinsert, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nsw <vscale x 4 x i32> %broadcast.splat, %b
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @dupsext_v2i16_v2i64(i16 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupsext_v2i16_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: sxth x8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i16 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 2 x i64> @dupsext_v2i32_v2i64(i32 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupsext_v2i32_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: sxtw x8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i32 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 2 x i16> @dupzext_v2i8_v2i16(i8 %src, <vscale x 2 x i16> %b) {
				; CHECK-LABEL: dupzext_v2i8_v2i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xff
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 2 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 2 x i16> %broadcast.splatinsert, <vscale x 2 x i16> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i16> %broadcast.splat, %b
				ret <vscale x 2 x i16> %out
				}

				define <vscale x 4 x i16> @dupzext_v4i8_v4i16(i8 %src, <vscale x 4 x i16> %b) {
				; CHECK-LABEL: dupzext_v4i8_v4i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xff
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 4 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 4 x i16> %broadcast.splatinsert, <vscale x 4 x i16> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nuw <vscale x 4 x i16> %broadcast.splat, %b
				ret <vscale x 4 x i16> %out
				}

				define <vscale x 8 x i16> @dupzext_v8i8_v8i16(i8 %src, <vscale x 8 x i16> %b) {
				; CHECK-LABEL: dupzext_v8i8_v8i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xff
				; CHECK-NEXT: mov z1.h, w8
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 8 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 8 x i16> %broadcast.splatinsert, <vscale x 8 x i16> undef, <vscale x 8 x i32> zeroinitializer
				%out = mul nuw <vscale x 8 x i16> %broadcast.splat, %b
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 2 x i32> @dupzext_v2i8_v2i32(i8 %src, <vscale x 2 x i32> %b) {
				; CHECK-LABEL: dupzext_v2i8_v2i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xff
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 2 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 2 x i32> %broadcast.splatinsert, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i32> %broadcast.splat, %b
				ret <vscale x 2 x i32> %out
				}

				define <vscale x 4 x i32> @dupzext_v4i8_v4i32(i8 %src, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: dupzext_v4i8_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xff
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 4 x i32> %broadcast.splatinsert, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nuw <vscale x 4 x i32> %broadcast.splat, %b
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @dupzext_v2i8_v2i64(i8 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupzext_v2i8_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: and x8, x0, #0xff
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 2 x i32> @dupzext_v2i16_v2i32(i16 %src, <vscale x 2 x i32> %b) {
				; CHECK-LABEL: dupzext_v2i16_v2i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xffff
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i16 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 2 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 2 x i32> %broadcast.splatinsert, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i32> %broadcast.splat, %b
				ret <vscale x 2 x i32> %out
				}

				define <vscale x 4 x i32> @dupzext_v4i16_v4i32(i16 %src, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: dupzext_v4i16_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xffff
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = zext i16 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 4 x i32> %broadcast.splatinsert, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nuw <vscale x 4 x i32> %broadcast.splat, %b
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @dupzext_v2i16_v2i64(i16 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupzext_v2i16_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: and x8, x0, #0xffff
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i16 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 2 x i64> @dupzext_v2i32_v2i64(i32 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupzext_v2i32_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i32 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple aarch64-none-linux-gnu \| FileCheck %s

				; Supported combines

				define <8 x i16> @dupsext_v8i8_v8i16(i8 %src, <8 x i8> %b) {
				; CHECK-LABEL: dupsext_v8i8_v8i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.8b, w0
				; CHECK-NEXT: smull v0.8h, v1.8b, v0.8b
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%ext.b = sext <8 x i8> %b to <8 x i16>
				%broadcast.splatinsert = insertelement <8 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <8 x i16> %broadcast.splatinsert, <8 x i16> undef, <8 x i32> zeroinitializer
				%out = mul nsw <8 x i16> %broadcast.splat, %ext.b
				ret <8 x i16> %out
				}

				define <8 x i16> @dupzext_v8i8_v8i16(i8 %src, <8 x i8> %b) {
				; CHECK-LABEL: dupzext_v8i8_v8i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.8b, w0
				; CHECK-NEXT: umull v0.8h, v1.8b, v0.8b
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i16
				%ext.b = zext <8 x i8> %b to <8 x i16>
				%broadcast.splatinsert = insertelement <8 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <8 x i16> %broadcast.splatinsert, <8 x i16> undef, <8 x i32> zeroinitializer
				%out = mul nuw <8 x i16> %broadcast.splat, %ext.b
				ret <8 x i16> %out
				}

				define <4 x i32> @dupsext_v4i16_v4i32(i16 %src, <4 x i16> %b) {
				; CHECK-LABEL: dupsext_v4i16_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.4h, w0
				; CHECK-NEXT: smull v0.4s, v1.4h, v0.4h
				; CHECK-NEXT: ret
				entry:
				%in = sext i16 %src to i32
				%ext.b = sext <4 x i16> %b to <4 x i32>
				%broadcast.splatinsert = insertelement <4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
				%out = mul nsw <4 x i32> %broadcast.splat, %ext.b
				ret <4 x i32> %out
				}

				define <4 x i32> @dupzext_v4i16_v4i32(i16 %src, <4 x i16> %b) {
				; CHECK-LABEL: dupzext_v4i16_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.4h, w0
				; CHECK-NEXT: umull v0.4s, v1.4h, v0.4h
				; CHECK-NEXT: ret
				entry:
				%in = zext i16 %src to i32
				%ext.b = zext <4 x i16> %b to <4 x i32>
				%broadcast.splatinsert = insertelement <4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
				%out = mul nuw <4 x i32> %broadcast.splat, %ext.b
				ret <4 x i32> %out
				}

				define <2 x i64> @dupsext_v2i32_v2i64(i32 %src, <2 x i32> %b) {
				; CHECK-LABEL: dupsext_v2i32_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.2s, w0
				; CHECK-NEXT: smull v0.2d, v1.2s, v0.2s
				; CHECK-NEXT: ret
				entry:
				%in = sext i32 %src to i64
				%ext.b = sext <2 x i32> %b to <2 x i64>
				%broadcast.splatinsert = insertelement <2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <2 x i64> %broadcast.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
				%out = mul nsw <2 x i64> %broadcast.splat, %ext.b
				ret <2 x i64> %out
				}

				define <2 x i64> @dupzext_v2i32_v2i64(i32 %src, <2 x i32> %b) {
				; CHECK-LABEL: dupzext_v2i32_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.2s, w0
				; CHECK-NEXT: umull v0.2d, v1.2s, v0.2s
				; CHECK-NEXT: ret
				entry:
				%in = zext i32 %src to i64
				%ext.b = zext <2 x i32> %b to <2 x i64>
				%broadcast.splatinsert = insertelement <2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <2 x i64> %broadcast.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
				%out = mul nuw <2 x i64> %broadcast.splat, %ext.b
				ret <2 x i64> %out
				}

				; Unsupported combines

				define <2 x i16> @dupsext_v2i8_v2i16(i8 %src, <2 x i8> %b) {
				; CHECK-LABEL: dupsext_v2i8_v2i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: shl v0.2s, v0.2s, #24
				; CHECK-NEXT: sshr v0.2s, v0.2s, #24
				; CHECK-NEXT: dup v1.2s, w8
				; CHECK-NEXT: mul v0.2s, v1.2s, v0.2s
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%ext.b = sext <2 x i8> %b to <2 x i16>
				%broadcast.splatinsert = insertelement <2 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <2 x i16> %broadcast.splatinsert, <2 x i16> undef, <2 x i32> zeroinitializer
				%out = mul nsw <2 x i16> %broadcast.splat, %ext.b
				ret <2 x i16> %out
				}

				define <2 x i64> @dupzext_v2i16_v2i64(i16 %src, <2 x i16> %b) {
				; CHECK-LABEL: dupzext_v2i16_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: movi d1, #0x00ffff0000ffff
				; CHECK-NEXT: and v0.8b, v0.8b, v1.8b
				; CHECK-NEXT: ushll v0.2d, v0.2s, #0
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: and x8, x0, #0xffff
				; CHECK-NEXT: fmov x10, d0
				; CHECK-NEXT: mov x9, v0.d[1]
				; CHECK-NEXT: mul x10, x8, x10
				; CHECK-NEXT: mul x8, x8, x9
				; CHECK-NEXT: fmov d0, x10
				; CHECK-NEXT: mov v0.d[1], x8
				; CHECK-NEXT: ret
				entry:
				%in = zext i16 %src to i64
				%ext.b = zext <2 x i16> %b to <2 x i64>
				%broadcast.splatinsert = insertelement <2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <2 x i64> %broadcast.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
				%out = mul nuw <2 x i64> %broadcast.splat, %ext.b
				ret <2 x i64> %out
				}

				; dupsext_v4i8_v4i16
				; dupsext_v2i8_v2i32
				; dupsext_v4i8_v4i32
				; dupsext_v2i8_v2i64
				; dupsext_v2i16_v2i32
				; dupsext_v2i16_v2i64
				; dupzext_v2i8_v2i16
				; dupzext_v4i8_v4i16
				; dupzext_v2i8_v2i32
				; dupzext_v4i8_v4i32
				; dupzext_v2i8_v2i64
				; dupzext_v2i16_v2i32
				; dupzext_v2i16_v2i64

				; Unsupported states

				define <8 x i16> @nonsplat_shuffle(i8 %src, <8 x i8> %b) {
				; CHECK-LABEL: nonsplat_shuffle:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: sshll v0.8h, v0.8b, #0
				; CHECK-NEXT: dup v1.8h, w8
				; CHECK-NEXT: mul v0.8h, v1.8h, v0.8h
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%ext.b = sext <8 x i8> %b to <8 x i16>
				%broadcast.splatinsert = insertelement <8 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <8 x i16> %broadcast.splatinsert, <8 x i16> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0>
				dmgreenUnsubmitted Done Reply Inline Actions I think this could still count as a splat, as the elements are undef (can happily take any value, including 0). It's probably fine to use a extra parameter and something like an ext shuffle: %broadcast.splat = shufflevector <8 x i16> %c, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1> That should hopefully test things like it not being a splat and the insert not existing. dmgreen: I think this could still count as a splat, as the elements are undef (can happily take any…
				%out = mul nsw <8 x i16> %broadcast.splat, %ext.b
				ret <8 x i16> %out
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 314550

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/aarch64-dup-ext-scalable.ll

llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll

[AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))
ClosedPublic