This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
48/53
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
aarch64-dup-ext-scalable.ll
1/1
aarch64-dup-ext.ll

Differential D91255

[AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))
ClosedPublic

Authored by NickGuy on Nov 11 2020, 6:22 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen
fhahn
efriedma

Commits

rG350247a93c07: [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))

Summary

Performing this rearrangement allows for existing patterns
to match against cases where operands are hoisted up to a loop
header (by LICM), allowing for optimisations such as umull/smull
codegen, while still allowing loop-invariant values to be hoisted
out of the loop afer other optimisations.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

NickGuy created this revision.Nov 11 2020, 6:22 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptNov 11 2020, 6:22 AM

NickGuy requested review of this revision.Nov 11 2020, 6:22 AM

NickGuy added inline comments.Nov 11 2020, 6:27 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10948–10950	In a number of tests, this line is hit more than a few times. Only 1 of which it fails on. Using .dump() to try and identify why didn't help, as it appears to be the same style as the others that pass (e.g. %tmp3 = sext <8 x i8> %arg to <8 x i16>) If anyone can provide insight to this, I would greatly appreciate it :)
15532–15541	I'm unsure as to when PerformDAGCombine is invoked. If this function generates a new DUP node, would this function then be invoked with that node? Or does this function need a bit more scaffolding to support this case?

SjoerdMeijer added inline comments.Nov 11 2020, 6:59 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10948–10950	what do you mean by fail? I guess that's a segfault? I guess you need to make sure it is an instruction first with dyn_cast, then you check its operands and uses.

NickGuy added inline comments.Nov 11 2020, 7:05 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10948–10950	Sorry, I should've been a bit clearer. It fails with an assertion error that stems from an invalid cast in CodeGenPrepare::tryToSinkFreeOperands. Shuffle->getOperandUse(0).dump() outputs something that looks like an instruction, but reportedly isn't, and only in a single case. It's simple to work around, but I was curious as to if anyone had an idea as to what was happening.

Harbormaster completed remote builds in B78452: Diff 304502.Nov 11 2020, 7:16 AM

SjoerdMeijer added inline comments.Nov 11 2020, 7:28 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10941	Nit: do you need the enumerate? Can you do just: for (auto *OpIdx : I->operands())
10948–10950	Still a bit unclear to me. But are you actually checking Shuffle is an instruction? I also don't see what the workaround is.
15498	Nit: A comment on what we are exactly combining here.
15499	Nit: performDUPSextCombine -> performDUPSExtCombine
15500	Nit: think the coding style is to use: auto *Operand = ... when it is a pointer. Same for more declarations below.
15506	Can you run clang-format on your patch? I've tried reading this function, but all these lint message makes it pretty unreadable to me.

Hello

This looks like two separate patches to me. One that folds dup(ext(..)) into ext(dup(..)), and another that tries to sink operands into a loop.

They should be separate patches with their own set of tests. The first, done universally, will need some thought (and maybe some benchmarks) to make sure it's always the correct thing to do, it's not just something that happens to work here. The fact that no other tests are failing is a good sign at least.

This looks like two separate patches to me.

I've pulled the operand sinking out to it's own patch now.

The first, done universally, will need some thought (and maybe some benchmarks) to make sure it's always the correct thing to do, it's not just something that happens to work here

Agreed, this review request was simply to gather some more opinions on the matter. There was no intention of merging this as-is.

The fact that no other tests are failing is a good sign at least.

Now that you mention it, it's a bit concerning. It's possible that there are no existing tests covering this case.

NickGuy added a child revision: D91271: [AArch64] Attempt to sink mul operands.Nov 11 2020, 8:48 AM

NickGuy added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10948–10950	Shuffle is always an instruction at this point, the part that's causing the assertion failure is pushing the operand use into Ops. The result of Shuffle->getOperandUse(0) is what is sometimes not an instruction. The workaround is the isa<Instruction>(&Shuffle->getOperandUse(0) check.
15499	I opted for performDUPExtCombine, as it performs both signed and zero extends
15500	None of these are pointer types, though that's good to know for future :)
15506	Done, sorry about that

Harbormaster completed remote builds in B78468: Diff 304529.Nov 11 2020, 9:22 AM

Thanks

So what this seems to be proposing is that ever instance of dup(sext(..)) is transformed into sext(dup(..)). That's a pretty general transform. Off the top of my head... the extend could be free in places (zext i32->i64) but it may allow more folding into other instructions like the smull/saddl/ssubl it can allow here. If the operand is a load, that should at least have been folded into a single instruction already prior to the dup being made. But I would probably expect a sxth+dup to be quicker than a dup+sshll in general.

I would suggest trying this on a number of small tests and seeing how it does in terms of instruction count. Alternatively try compiling the llvm testsuite and seeing how many things change, and if they look better or worse in the process. But like I said this may need to be a bit more targeted.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15503	The llvm style guide suggests not to over-use auto like this. It just makes it more difficult telling what things are.
15509	These is no such thing as a "non-vector dup", as far as I understand.
15528	Create a `SDLoc DL(N)` and use it in both getNodes. This doesn't need getVTList for single types. Or {} for the operands I don't think.
15542	Detecting the extend is probably best done inside the function. It's common to do: if (SDValue Ext = performDUPExtCombine(..)) return Ext; I would personally leave out at least AssertSext and AssertZext until you at least have a test that shows them being needed. If this does SIGN_EXTEND_INREG it should probably handle the equivalent AND as well.
llvm/test/CodeGen/AArch64/aarch64-matrix-smull.ll
4 ↗	(On Diff #304529)	I would expect tests that looked something like (but I got and edited this one from mve): define <4 x i32> @vdup_i16(i16 %src) { ; CHECK-LABEL: vdup_i16: ; CHECK: @ %bb.0: @ %entry ; CHECK-NEXT: vdup.16 q0, r0 ; CHECK-NEXT: bx lr entry: %0 = insertelement <4 x i16> undef, i16 %src, i32 0 %x = shufflevector <4 x i16> %0, <4 x i16> undef, <4 x i32> zeroinitializer %out = sext <4 x i16> %0 to <4 x i32> ret <4 x i32> %out } But for all type and sizes that this transform supports. Which seems to be a lot at the moment. Don't forget scalable types too. Having tests that show that mul(sext(...), dup(sext(...))) are also folded sounds useful too, but they can hopefully be equally small.

dmgreen added inline comments.Nov 11 2020, 2:02 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15524	I'm not sure what this is doing. It should be sign extending from the smaller type with the correct number of vector lanes (either ExtOperand.getValueType() for a sext/zext or Operand.getOperand(1) for a SIGN_EXTEND_INREG, and something based on the mask for an AND). It them probably has to make sure that the new extend is a legal operation.

Addressed comments, and changed to be more targeted. The rearrangement will only be performed if the pattern feeds into a mul instruction, and only if the type combination is valid for smull/umull folding.

llvm/test/CodeGen/AArch64/aarch64-matrix-smull.ll
4 ↗	(On Diff #304529)	I've added some tests testing the behaviour for different types and sizes (all generated, so the IR is identical apart from the types). I've omitted support for scalable types, as I encountered some issues when testing them. I plan to make progress with the fixed types first, then revisit scalable types later (Unless it turns out that to support them, I'm missing 1 line somewhere).

Harbormaster completed remote builds in B80816: Diff 308966.Dec 2 2020, 8:03 AM

Removed redundant/useless debug prints that were erroneously included in the patch.

Harbormaster completed remote builds in B80820: Diff 308975.Dec 2 2020, 8:29 AM

Sorry for the delay. There is a lot of code here all of a sudden. More than I expected!

Can you run clang-format on the patch to make it more readable?

Can you run clang-format on the patch to make it more readable?

Done, sorry about that (Guess I should have another look at my pre-commit hooks)

Harbormaster completed remote builds in B81263: Diff 309844.Dec 7 2020, 3:11 AM

Fixed more formatting issues (that were apparently missed by clang-format before?)

Harbormaster completed remote builds in B81296: Diff 309909.Dec 7 2020, 7:51 AM

Thanks, that makes it clearer at least.

There's quite a bit of code here. I was hoping this would be a lot simpler, and it would be good if some of this could be simplified (but a lot of this might well be needed for all the cases that are being supported?). It makes it more difficult to check all the things it's doing.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11788	There is a convenience function, DAG.getUNDEF(VT); Probably doesn't need a lambda as a result.
11820	Debug messages are fairly uncommon in ISel Lowering. It tends to trigger a lot of times where they are not important, and the debug messages it prints already are usually enough to give a rough indication of what is going on. This whole function probably simplifies a lot I think? Is it something like TargetType.getHalfNumVectorElementsVT == PreExtendVT and the TargetType is one of {...}? I'm not sure how the MVT::v16i8 extended to MVT::v8i16 works out, from ISel. I think because ISel is type checked, those cases would not come up even if the underlying instructions supported it? We are not optimizing smull2 as far as I understand.
11851	Do we need to check both shuffles and DUP's? It that for the i64 types that are otherwise illegal somehow?
11876	A mul should always be an integer I think?
11901–11902	This comment is very short (as in - the length of the line). It probably doesn't need the newline before it either.
11909	Multiplies (and other BinOps) only have 2 operands. It would probably be simpler to just check both the operands as opposed to needing the loop.
11918	This also seems to make changes without necessarily detecting that they are useful. That is probably fine here considering what is being transformed, but it can be better to represent it as "test if it will help, if so then make the change".

NickGuy updated this revision to Diff 311506.Dec 14 2020, 12:26 AM

NickGuy marked 5 inline comments as done.

NickGuy added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11788	Ah, thanks for pointing that out. Lambda is removed, too.
11820	Debug messages are fairly uncommon in ISel Lowering Removed the debug messages.
11851	No, we don't. Looks like checking for shuffles catches the same DUP cases before they become DUPs. I've removed the DUP-specific check, and have unified it down the VectorShuffle check
11876	That was a remnant of before we were restricting this to muls only. Regardless, this has now been removed.
11909	It would probably be simpler to just check both the operands as opposed to needing the loop. I disagree. Having the loop here makes it clearer that each operand is being handled in the same way, while having them separated needlessly duplicates the code.

Harbormaster completed remote builds in B82226: Diff 311506.Dec 14 2020, 12:36 AM

Fixed broken commit

Harbormaster completed remote builds in B82427: Diff 311840.Dec 15 2020, 1:37 AM

Fixed formatting, and removed "Unsupported combines" tests as they were failing due to a separate issue (and contribute very little value to this patch)

Harbormaster completed remote builds in B82433: Diff 311853.Dec 15 2020, 3:09 AM

In D91255#2454474, @NickGuy wrote:

Fixed formatting, and removed "Unsupported combines" tests as they were failing due to a separate issue (and contribute very little value to this patch)

It might be worth keeping one or two to show it doesn't do anything wrong/crash on incorrect types.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11708	Add a message explaining what this function does. It looks generally useful. It might also be worth having this function return the PreExtendType instead, and letting the parent function do the extend to vector type. Either way is fine but it might allow the function to be used for other things, if they come up.
11732	Can remove debug here and below.
11754	Nit: Doesn't need a break like this after a return.
11757	I think this can be PostExtendType.changeVectorElementType(PreExtendType);
11763	SDNode -> SDValue
11772	I may be wrong, but is the `PreExtendVT != MVT::v16i8` half of this ever used? TargetType and PreExtendVT should have the same number of vector elements I believe. And then `TargetType.getScalarSizeInBits() == 2 * PreExtendVT.getScalarSizeInBits()` maybe? That might help this lambda simplify further, possibly to the point that it is easier to inline it, now that it is only used in one place.
11805	Nit: DebugLoc is usually called just DL (I think DebugLoc is already the name of the type used in Machine Instructions).
11809	Nit: It doesn't need the {} around the operands, there are overloaded methods to handle that automatically already.
11816	It may be simpler to generate the AArch64ISD::DUP directly? I'm not sure either way which is better, but it's less nodes and we know it's going to get there eventually. Be careful about illegal types though.
11909	It would seem simpler (and smaller) to just call performCommonVectorExtendCombine for each operand. It may need to use something like `Op0 ? Op0 : Mul->getOperand(0)`, but it would remove the need to track Changed and most of the rest of this loop.

Addressed comments, and re-added a couple of "Unsupported combines" tests.

Harbormaster completed remote builds in B82640: Diff 312207.Dec 16 2020, 8:15 AM

Thanks for the updates, this is looking good.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

11720

SDNode* -> SDValue

11826–11830

I think this will always produce a new value, as opposed to returning SDValue when nothing is done. It's probably best to make it something like this instead:

SDValue Op0 = performCommonVectorExtendCombine(Mul->getOperand(0), DAG);
SDValue Op1 = performCommonVectorExtendCombine(Mul->getOperand(1), DAG);
if (!Op0 && !Op1)
  return SDValue();

SDLoc DL(Mul);
return DAG.getNode(Mul->getOpcode(), DL, Mul->getValueType(0),
                   Op0 ? Op0 : Mul->getOperand(0),
                   Op1 ? Op1 : Mul->getOperand(1));

Addressed Comments

NickGuy added inline comments.Dec 18 2020, 3:02 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11772	Not entirely sure it is, I added that check to explicitly allow the types accepted by u/smull. I've changed this now.
11816	Generating the DUP directly, while simpler at this point, failed due to this combine being performed fairly early, and the DUP not being handled as part of lowering. Given that, keeping it as shuffle_vector at this point seems simpler, as we can also benefit from the existing VECTOR_SHUFFLE->DUP lowering checks.

Harbormaster completed remote builds in B82929: Diff 312736.Dec 18 2020, 4:13 AM

Fixed incorrect error-handling

Harbormaster completed remote builds in B82955: Diff 312773.Dec 18 2020, 6:27 AM

dmgreen added inline comments.Dec 21 2020, 12:32 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11721	const SDValue &ExtOperand -> SDValue ExtOperand Its also only used in one place and needn't get a variable.
11730	const SDValue &TypeOperand -> SDValue TypeOperand
11768	This doesn't seem to be checking that the shuffle/insert are actually a splat. There is an isSplatValue method that could help, depending on if it's really checking what this wants to check.

NickGuy updated this revision to Diff 314185.Dec 31 2020, 5:01 AM

NickGuy marked 3 inline comments as done.

NickGuy added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11768	Done, though I wasn't able to produce IR that represented a shuffle/insert that wasn't a splat, so I haven't got any tests for this case.

Harbormaster completed remote builds in B83784: Diff 314185.Dec 31 2020, 5:48 AM

Added test case for non-splat shuffles

Harbormaster completed remote builds in B84013: Diff 314550.Jan 5 2021, 3:38 AM

dmgreen added inline comments.Jan 5 2021, 8:34 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11717	Although (I know..) I suggested this, it does more than what we might expect here. And there's the chance it could change in any number of weird and wonderful ways in the future. As we are relying on this being a vector shuffle with an insert element and a zero mask, I think we should check for that. I think it should be enough to test that: The shuffle vector has a zero mask (ShuffleVectorSDNode(VectorShuffle)->isSplat(), getSplatIndex()==0) The first first operand is an insert That is inserting into lane 0.
llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
169	I think this could still count as a splat, as the elements are undef (can happily take any value, including 0). It's probably fine to use a extra parameter and something like an ext shuffle: %broadcast.splat = shufflevector <8 x i16> %c, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1> That should hopefully test things like it not being a splat and the insert not existing.

Addressed comments, updating the checks and tests

Harbormaster completed remote builds in B84188: Diff 314861.Jan 6 2021, 5:39 AM

Thanks. LGTM with a couple of suggestions.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11762	You can probably remove this now and just rely on the dyn_cast<ShuffleVectorSDNode> check below.
11772	Maybe move this comment up above the if, to avoid the misleading indentation.
11781–11787	Maybe something like this? // Ensures the insert is inserting into lane 0 auto *Constant = dyn_cast<ConstantSDNode>(InsertLane.getNode()); if (!Constant \|\| Constant->getZExtValue() != 0) return SDValue();

This revision is now accepted and ready to land.Jan 6 2021, 5:44 AM

Closed by commit rG350247a93c07: [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup)) (authored by NickGuy). · Explain WhyJan 6 2021, 8:10 AM

This revision was automatically updated to reflect the committed changes.

NickGuy marked 3 inline comments as done.

NickGuy added a commit: rG350247a93c07: [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup)).

NickGuy mentioned this in D91271: [AArch64] Attempt to sink mul operands.Jan 7 2021, 2:52 AM

This commit caused a crash on our bots with a null LLVMTy at changeExtendedVectorElementType. Reduced case: https://godbolt.org/z/7ehffx

In D91255#2484399, @dmajor wrote:

This commit caused a crash on our bots with a null LLVMTy at changeExtendedVectorElementType. Reduced case: https://godbolt.org/z/7ehffx

Thanks for bringing this to my attention, I've got a fix in review at D94234.

NickGuy mentioned this in rGed23229a64ae: [AArch64] Fix crash caused by invalid vector element type.Jan 8 2021, 4:03 AM

NickGuy mentioned this in rGdda60035e9f0: [AArch64] Attempt to sink mul operands.Jan 13 2021, 7:23 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

143 lines

test/

CodeGen/

AArch64/

aarch64-dup-ext-scalable.ll

327 lines

aarch64-dup-ext.ll

185 lines

Diff 314904

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,932 Lines • ▼ Show 20 Lines	case Instruction::Add: {

Ops.push_back(&I->getOperandUse(0));		Ops.push_back(&I->getOperandUse(0));
Ops.push_back(&I->getOperandUse(1));		Ops.push_back(&I->getOperandUse(1));

return true;		return true;
}		}
default:		default:
return false;		return false;
}		}
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: do you need the enumerate? Can you do just: for (auto OpIdx : I->operands()) SjoerdMeijer:* Nit: do you need the enumerate? Can you do just: for (auto *OpIdx : I->operands())
return false;		return false;
}		}

bool AArch64TargetLowering::hasPairedLoad(EVT LoadedType,		bool AArch64TargetLowering::hasPairedLoad(EVT LoadedType,
Align &RequiredAligment) const {		Align &RequiredAligment) const {
if (!LoadedType.isSimple() \|\|		if (!LoadedType.isSimple() \|\|
(!LoadedType.isInteger() && !LoadedType.isFloatingPoint()))		(!LoadedType.isInteger() && !LoadedType.isFloatingPoint()))
return false;		return false;
// Cyclone supports unaligned accesses.		// Cyclone supports unaligned accesses.
		NickGuyAuthorUnsubmitted Done Reply Inline Actions In a number of tests, this line is hit more than a few times. Only 1 of which it fails on. Using .dump() to try and identify why didn't help, as it appears to be the same style as the others that pass (e.g. %tmp3 = sext <8 x i8> %arg to <8 x i16>) If anyone can provide insight to this, I would greatly appreciate it :) NickGuy: In a number of tests, this line is hit more than a few times. Only 1 of which it fails on.
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions what do you mean by fail? I guess that's a segfault? I guess you need to make sure it is an instruction first with dyn_cast, then you check its operands and uses. SjoerdMeijer: what do you mean by fail? I guess that's a segfault? I guess you need to make sure it is an…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions Sorry, I should've been a bit clearer. It fails with an assertion error that stems from an invalid cast in CodeGenPrepare::tryToSinkFreeOperands. Shuffle->getOperandUse(0).dump() outputs something that looks like an instruction, but reportedly isn't, and only in a single case. It's simple to work around, but I was curious as to if anyone had an idea as to what was happening. NickGuy: Sorry, I should've been a bit clearer. It fails with an assertion error that stems from an…
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Still a bit unclear to me. But are you actually checking Shuffle is an instruction? I also don't see what the workaround is. SjoerdMeijer: Still a bit unclear to me. But are you actually checking Shuffle is an instruction? I also…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions Shuffle is always an instruction at this point, the part that's causing the assertion failure is pushing the operand use into Ops. The result of Shuffle->getOperandUse(0) is what is sometimes not an instruction. The workaround is the isa<Instruction>(&Shuffle->getOperandUse(0) check. NickGuy: Shuffle is always an instruction at this point, the part that's causing the assertion failure…
RequiredAligment = Align(1);		RequiredAligment = Align(1);
unsigned NumBits = LoadedType.getSizeInBits();		unsigned NumBits = LoadedType.getSizeInBits();
return NumBits == 32 \|\| NumBits == 64;		return NumBits == 32 \|\| NumBits == 64;
}		}

/// A helper function for determining the number of interleaved accesses we		/// A helper function for determining the number of interleaved accesses we
/// will generate when lowering accesses of the given type.		/// will generate when lowering accesses of the given type.
unsigned		unsigned
▲ Show 20 Lines • Show All 741 Lines • ▼ Show 20 Lines	static bool IsSVECntIntrinsic(SDValue S) {
case Intrinsic::aarch64_sve_cnth:		case Intrinsic::aarch64_sve_cnth:
case Intrinsic::aarch64_sve_cntw:		case Intrinsic::aarch64_sve_cntw:
case Intrinsic::aarch64_sve_cntd:		case Intrinsic::aarch64_sve_cntd:
return true;		return true;
}		}
return false;		return false;
}		}

		/// Calculates what the pre-extend type is, based on the extension
		dmgreenUnsubmitted Done Reply Inline Actions Add a message explaining what this function does. It looks generally useful. It might also be worth having this function return the PreExtendType instead, and letting the parent function do the extend to vector type. Either way is fine but it might allow the function to be used for other things, if they come up. dmgreen: Add a message explaining what this function does. It looks generally useful. It might also be…
		/// operation node provided by \p Extend.
		///
		/// In the case that \p Extend is a SIGN_EXTEND or a ZERO_EXTEND, the
		/// pre-extend type is pulled directly from the operand, while other extend
		/// operations need a bit more inspection to get this information.
		///
		/// \param Extend The SDNode from the DAG that represents the extend operation
		/// \param DAG The SelectionDAG hosting the \p Extend node
		///
		dmgreenUnsubmitted Done Reply Inline Actions Although (I know..) I suggested this, it does more than what we might expect here. And there's the chance it could change in any number of weird and wonderful ways in the future. As we are relying on this being a vector shuffle with an insert element and a zero mask, I think we should check for that. I think it should be enough to test that: The shuffle vector has a zero mask (ShuffleVectorSDNode(VectorShuffle)->isSplat(), getSplatIndex()==0) The first first operand is an insert That is inserting into lane 0. dmgreen: Although (I know..) I suggested this, it does more than what we might expect here. And there's…
		/// \returns The type representing the \p Extend source type, or \p MVT::Other
		/// if no valid type can be determined
		static EVT calculatePreExtendType(SDValue Extend, SelectionDAG &DAG) {
		dmgreenUnsubmitted Done Reply Inline Actions SDNode* -> SDValue dmgreen: SDNode* -> SDValue
		switch (Extend.getOpcode()) {
		dmgreenUnsubmitted Done Reply Inline Actions const SDValue &ExtOperand -> SDValue ExtOperand Its also only used in one place and needn't get a variable. dmgreen: const SDValue &ExtOperand -> SDValue ExtOperand Its also only used in one place and needn't…
		case ISD::SIGN_EXTEND:
		case ISD::ZERO_EXTEND:
		return Extend.getOperand(0).getValueType();
		case ISD::AssertSext:
		case ISD::AssertZext:
		case ISD::SIGN_EXTEND_INREG: {
		VTSDNode *TypeNode = dyn_cast<VTSDNode>(Extend.getOperand(1));
		if (!TypeNode)
		return MVT::Other;
		dmgreenUnsubmitted Done Reply Inline Actions const SDValue &TypeOperand -> SDValue TypeOperand dmgreen: const SDValue &TypeOperand -> SDValue TypeOperand
		return TypeNode->getVT();
		}
		dmgreenUnsubmitted Done Reply Inline Actions Can remove debug here and below. dmgreen: Can remove debug here and below.
		case ISD::AND: {
		ConstantSDNode *Constant =
		dyn_cast<ConstantSDNode>(Extend.getOperand(1).getNode());
		if (!Constant)
		return MVT::Other;

		uint32_t Mask = Constant->getZExtValue();

		if (Mask == UCHAR_MAX)
		return MVT::i8;
		else if (Mask == USHRT_MAX)
		return MVT::i16;
		else if (Mask == UINT_MAX)
		return MVT::i32;

		return MVT::Other;
		}
		default:
		return MVT::Other;
		}

		llvm_unreachable("Code path unhandled in calculatePreExtendType!");
		dmgreenUnsubmitted Done Reply Inline Actions Nit: Doesn't need a break like this after a return. dmgreen: Nit: Doesn't need a break like this after a return.
		}

		/// Combines a dup(sext/zext) node pattern into sext/zext(dup)
		dmgreenUnsubmitted Done Reply Inline Actions I think this can be PostExtendType.changeVectorElementType(PreExtendType); dmgreen: I think this can be PostExtendType.changeVectorElementType(PreExtendType);
		/// making use of the vector SExt/ZExt rather than the scalar SExt/ZExt
		static SDValue performCommonVectorExtendCombine(SDValue VectorShuffle,
		SelectionDAG &DAG) {

		ShuffleVectorSDNode *ShuffleNode =
		dmgreenUnsubmitted Done Reply Inline Actions You can probably remove this now and just rely on the dyn_cast<ShuffleVectorSDNode> check below. dmgreen: You can probably remove this now and just rely on the dyn_cast<ShuffleVectorSDNode> check below.
		dyn_cast<ShuffleVectorSDNode>(VectorShuffle.getNode());
		dmgreenUnsubmitted Done Reply Inline Actions SDNode -> SDValue dmgreen: SDNode -> SDValue
		if (!ShuffleNode)
		return SDValue();

		// Ensuring the mask is zero before continuing
		if (!ShuffleNode->isSplat() \|\| ShuffleNode->getSplatIndex() != 0)
		dmgreenUnsubmitted Done Reply Inline Actions This doesn't seem to be checking that the shuffle/insert are actually a splat. There is an isSplatValue method that could help, depending on if it's really checking what this wants to check. dmgreen: This doesn't seem to be checking that the shuffle/insert are actually a splat. There is an…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions Done, though I wasn't able to produce IR that represented a shuffle/insert that wasn't a splat, so I haven't got any tests for this case. NickGuy: Done, though I wasn't able to produce IR that represented a shuffle/insert that wasn't a splat…
		return SDValue();

		SDValue InsertVectorElt = VectorShuffle.getOperand(0);

		dmgreenUnsubmitted Done Reply Inline Actions I may be wrong, but is the `PreExtendVT != MVT::v16i8` half of this ever used? TargetType and PreExtendVT should have the same number of vector elements I believe. And then `TargetType.getScalarSizeInBits() == 2 * PreExtendVT.getScalarSizeInBits()` maybe? That might help this lambda simplify further, possibly to the point that it is easier to inline it, now that it is only used in one place. dmgreen: I may be wrong, but is the `PreExtendVT != MVT::v16i8` half of this ever used? TargetType and…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions Not entirely sure it is, I added that check to explicitly allow the types accepted by u/smull. I've changed this now. NickGuy: Not entirely sure it is, I added that check to explicitly allow the types accepted by u/smull.
		dmgreenUnsubmitted Done Reply Inline Actions Maybe move this comment up above the if, to avoid the misleading indentation. dmgreen: Maybe move this comment up above the if, to avoid the misleading indentation.
		if (InsertVectorElt.getOpcode() != ISD::INSERT_VECTOR_ELT)
		return SDValue();

		SDValue InsertLane = InsertVectorElt.getOperand(2);
		ConstantSDNode *Constant = dyn_cast<ConstantSDNode>(InsertLane.getNode());
		// Ensures the insert is inserting into lane 0
		if (!Constant \|\| Constant->getZExtValue() != 0)
		return SDValue();

		SDValue Extend = InsertVectorElt.getOperand(1);
		unsigned ExtendOpcode = Extend.getOpcode();

		bool IsSExt = ExtendOpcode == ISD::SIGN_EXTEND \|\|
		ExtendOpcode == ISD::SIGN_EXTEND_INREG \|\|
		ExtendOpcode == ISD::AssertSext;
		dmgreenUnsubmitted Done Reply Inline Actions Maybe something like this? // Ensures the insert is inserting into lane 0 auto Constant = dyn_cast<ConstantSDNode>(InsertLane.getNode()); if (!Constant \|\| Constant->getZExtValue() != 0) return SDValue(); dmgreen:* Maybe something like this? ``` // Ensures the insert is inserting into lane 0 auto…
		if (!IsSExt && ExtendOpcode != ISD::ZERO_EXTEND &&
		dmgreenUnsubmitted Done Reply Inline Actions There is a convenience function, DAG.getUNDEF(VT); Probably doesn't need a lambda as a result. dmgreen: There is a convenience function, DAG.getUNDEF(VT); Probably doesn't need a lambda as a result.
		NickGuyAuthorUnsubmitted Done Reply Inline Actions Ah, thanks for pointing that out. Lambda is removed, too. NickGuy: Ah, thanks for pointing that out. Lambda is removed, too.
		ExtendOpcode != ISD::AssertZext && ExtendOpcode != ISD::AND)
		return SDValue();

		EVT TargetType = VectorShuffle.getValueType();
		EVT PreExtendType = calculatePreExtendType(Extend, DAG);

		if ((TargetType != MVT::v8i16 && TargetType != MVT::v4i32 &&
		TargetType != MVT::v2i64) \|\|
		(PreExtendType == MVT::Other))
		return SDValue();

		EVT PreExtendVT = TargetType.changeVectorElementType(PreExtendType);

		if (PreExtendVT.getVectorElementCount() != TargetType.getVectorElementCount())
		return SDValue();

		if (TargetType.getScalarSizeInBits() != PreExtendVT.getScalarSizeInBits() * 2)
		dmgreenUnsubmitted Done Reply Inline Actions Nit: DebugLoc is usually called just DL (I think DebugLoc is already the name of the type used in Machine Instructions). dmgreen: Nit: DebugLoc is usually called just DL (I think DebugLoc is already the name of the type used…
		return SDValue();

		SDLoc DL(VectorShuffle);

		dmgreenUnsubmitted Done Reply Inline Actions Nit: It doesn't need the {} around the operands, there are overloaded methods to handle that automatically already. dmgreen: Nit: It doesn't need the {} around the operands, there are overloaded methods to handle that…
		SDValue InsertVectorNode = DAG.getNode(
		InsertVectorElt.getOpcode(), DL, PreExtendVT, DAG.getUNDEF(PreExtendVT),
		Extend.getOperand(0), DAG.getConstant(0, DL, MVT::i64));

		std::vector<int> ShuffleMask(TargetType.getVectorElementCount().getValue());

		SDValue VectorShuffleNode =
		dmgreenUnsubmitted Done Reply Inline Actions It may be simpler to generate the AArch64ISD::DUP directly? I'm not sure either way which is better, but it's less nodes and we know it's going to get there eventually. Be careful about illegal types though. dmgreen: It may be simpler to generate the AArch64ISD::DUP directly? I'm not sure either way which is…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions Generating the DUP directly, while simpler at this point, failed due to this combine being performed fairly early, and the DUP not being handled as part of lowering. Given that, keeping it as shuffle_vector at this point seems simpler, as we can also benefit from the existing VECTOR_SHUFFLE->DUP lowering checks. NickGuy: Generating the DUP directly, while simpler at this point, failed due to this combine being…
		DAG.getVectorShuffle(PreExtendVT, DL, InsertVectorNode,
		DAG.getUNDEF(PreExtendVT), ShuffleMask);

		SDValue ExtendNode =
		dmgreenUnsubmitted Done Reply Inline Actions Debug messages are fairly uncommon in ISel Lowering. It tends to trigger a lot of times where they are not important, and the debug messages it prints already are usually enough to give a rough indication of what is going on. This whole function probably simplifies a lot I think? Is it something like TargetType.getHalfNumVectorElementsVT == PreExtendVT and the TargetType is one of {...}? I'm not sure how the MVT::v16i8 extended to MVT::v8i16 works out, from ISel. I think because ISel is type checked, those cases would not come up even if the underlying instructions supported it? We are not optimizing smull2 as far as I understand. dmgreen: Debug messages are fairly uncommon in ISel Lowering. It tends to trigger a lot of times where…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions Debug messages are fairly uncommon in ISel Lowering Removed the debug messages. NickGuy: > Debug messages are fairly uncommon in ISel Lowering Removed the debug messages.
		DAG.getNode(IsSExt ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND, DL, TargetType,
		VectorShuffleNode, DAG.getValueType(TargetType));

		return ExtendNode;
		}

		/// Combines a mul(dup(sext/zext)) node pattern into mul(sext/zext(dup))
		/// making use of the vector SExt/ZExt rather than the scalar SExt/ZExt
		static SDValue performMulVectorExtendCombine(SDNode *Mul, SelectionDAG &DAG) {
		// If the value type isn't a vector, none of the operands are going to be dups
		dmgreenUnsubmitted Done Reply Inline Actions I think this will always produce a new value, as opposed to returning SDValue when nothing is done. It's probably best to make it something like this instead: SDValue Op0 = performCommonVectorExtendCombine(Mul->getOperand(0), DAG); SDValue Op1 = performCommonVectorExtendCombine(Mul->getOperand(1), DAG); if (!Op0 && !Op1) return SDValue(); SDLoc DL(Mul); return DAG.getNode(Mul->getOpcode(), DL, Mul->getValueType(0), Op0 ? Op0 : Mul->getOperand(0), Op1 ? Op1 : Mul->getOperand(1)); dmgreen: I think this will always produce a new value, as opposed to returning SDValue when nothing is…
		if (!Mul->getValueType(0).isVector())
		return SDValue();

		SDValue Op0 = performCommonVectorExtendCombine(Mul->getOperand(0), DAG);
		SDValue Op1 = performCommonVectorExtendCombine(Mul->getOperand(1), DAG);

		// Neither operands have been changed, don't make any further changes
		if (!Op0 && !Op1)
		return SDValue();

		SDLoc DL(Mul);
		return DAG.getNode(Mul->getOpcode(), DL, Mul->getValueType(0),
		Op0 ? Op0 : Mul->getOperand(0),
		Op1 ? Op1 : Mul->getOperand(1));
		}

static SDValue performMulCombine(SDNode *N, SelectionDAG &DAG,		static SDValue performMulCombine(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
const AArch64Subtarget *Subtarget) {		const AArch64Subtarget *Subtarget) {

		if (SDValue Ext = performMulVectorExtendCombine(N, DAG))
		dmgreenUnsubmitted Done Reply Inline Actions Do we need to check both shuffles and DUP's? It that for the i64 types that are otherwise illegal somehow? dmgreen: Do we need to check both shuffles and DUP's? It that for the i64 types that are otherwise…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions No, we don't. Looks like checking for shuffles catches the same DUP cases before they become DUPs. I've removed the DUP-specific check, and have unified it down the VectorShuffle check NickGuy: No, we don't. Looks like checking for shuffles catches the same DUP cases before they become…
		return Ext;

if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

// The below optimizations require a constant RHS.		// The below optimizations require a constant RHS.
if (!isa<ConstantSDNode>(N->getOperand(1)))		if (!isa<ConstantSDNode>(N->getOperand(1)))
return SDValue();		return SDValue();

SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
ConstantSDNode *C = cast<ConstantSDNode>(N->getOperand(1));		ConstantSDNode *C = cast<ConstantSDNode>(N->getOperand(1));
const APInt &ConstValue = C->getAPIntValue();		const APInt &ConstValue = C->getAPIntValue();

// Allow the scaling to be folded into the `cnt` instruction by preventing		// Allow the scaling to be folded into the `cnt` instruction by preventing
// the scaling to be obscured here. This makes it easier to pattern match.		// the scaling to be obscured here. This makes it easier to pattern match.
if (IsSVECntIntrinsic(N0) \|\|		if (IsSVECntIntrinsic(N0) \|\|
(N0->getOpcode() == ISD::TRUNCATE &&		(N0->getOpcode() == ISD::TRUNCATE &&
(IsSVECntIntrinsic(N0->getOperand(0)))))		(IsSVECntIntrinsic(N0->getOperand(0)))))
if (ConstValue.sge(1) && ConstValue.sle(16))		if (ConstValue.sge(1) && ConstValue.sle(16))
return SDValue();		return SDValue();

// Multiplication of a power of two plus/minus one can be done more		// Multiplication of a power of two plus/minus one can be done more
// cheaply as as shift+add/sub. For now, this is true unilaterally. If		// cheaply as as shift+add/sub. For now, this is true unilaterally. If
// future CPUs have a cheaper MADD instruction, this may need to be		// future CPUs have a cheaper MADD instruction, this may need to be
// gated on a subtarget feature. For Cyclone, 32-bit MADD is 4 cycles and		// gated on a subtarget feature. For Cyclone, 32-bit MADD is 4 cycles and
		dmgreenUnsubmitted Done Reply Inline Actions A mul should always be an integer I think? dmgreen: A mul should always be an integer I think?
		NickGuyAuthorUnsubmitted Done Reply Inline Actions That was a remnant of before we were restricting this to muls only. Regardless, this has now been removed. NickGuy: That was a remnant of before we were restricting this to muls only. Regardless, this has now…
// 64-bit is 5 cycles, so this is always a win.		// 64-bit is 5 cycles, so this is always a win.
// More aggressively, some multiplications N0 * C can be lowered to		// More aggressively, some multiplications N0 * C can be lowered to
// shift+add+shift if the constant C = A * B where A = 2^N + 1 and B = 2^M,		// shift+add+shift if the constant C = A * B where A = 2^N + 1 and B = 2^M,
// e.g. 6=32=(2+1)2.		// e.g. 6=32=(2+1)2.
// TODO: consider lowering more cases, e.g. C = 14, -6, -14 or even 45		// TODO: consider lowering more cases, e.g. C = 14, -6, -14 or even 45
// which equals to (1+2)*16-(1+2).		// which equals to (1+2)*16-(1+2).
// TrailingZeroes is used to test if the mul can be lowered to		// TrailingZeroes is used to test if the mul can be lowered to
// shift+add+shift.		// shift+add+shift.
unsigned TrailingZeroes = ConstValue.countTrailingZeros();		unsigned TrailingZeroes = ConstValue.countTrailingZeros();
if (TrailingZeroes) {		if (TrailingZeroes) {
// Conservatively do not lower to shift+add+shift if the mul might be		// Conservatively do not lower to shift+add+shift if the mul might be
// folded into smul or umul.		// folded into smul or umul.
if (N0->hasOneUse() && (isSignExtended(N0.getNode(), DAG) \|\|		if (N0->hasOneUse() && (isSignExtended(N0.getNode(), DAG) \|\|
isZeroExtended(N0.getNode(), DAG)))		isZeroExtended(N0.getNode(), DAG)))
return SDValue();		return SDValue();
// Conservatively do not lower to shift+add+shift if the mul might be		// Conservatively do not lower to shift+add+shift if the mul might be
// folded into madd or msub.		// folded into madd or msub.
if (N->hasOneUse() && (N->use_begin()->getOpcode() == ISD::ADD \|\|		if (N->hasOneUse() && (N->use_begin()->getOpcode() == ISD::ADD \|\|
N->use_begin()->getOpcode() == ISD::SUB))		N->use_begin()->getOpcode() == ISD::SUB))
return SDValue();		return SDValue();
}		}
// Use ShiftedConstValue instead of ConstValue to support both shift+add/sub		// Use ShiftedConstValue instead of ConstValue to support both shift+add/sub
// and shift+add+shift.		// and shift+add+shift.
APInt ShiftedConstValue = ConstValue.ashr(TrailingZeroes);		APInt ShiftedConstValue = ConstValue.ashr(TrailingZeroes);

unsigned ShiftAmt, AddSubOpc;		unsigned ShiftAmt, AddSubOpc;
		dmgreenUnsubmitted Done Reply Inline Actions This comment is very short (as in - the length of the line). It probably doesn't need the newline before it either. dmgreen: This comment is very short (as in - the length of the line). It probably doesn't need the…
// Is the shifted value the LHS operand of the add/sub?		// Is the shifted value the LHS operand of the add/sub?
bool ShiftValUseIsN0 = true;		bool ShiftValUseIsN0 = true;
// Do we need to negate the result?		// Do we need to negate the result?
bool NegateResult = false;		bool NegateResult = false;

if (ConstValue.isNonNegative()) {		if (ConstValue.isNonNegative()) {
// (mul x, 2^N + 1) => (add (shl x, N), x)		// (mul x, 2^N + 1) => (add (shl x, N), x)
		dmgreenUnsubmitted Done Reply Inline Actions Multiplies (and other BinOps) only have 2 operands. It would probably be simpler to just check both the operands as opposed to needing the loop. dmgreen: Multiplies (and other BinOps) only have 2 operands. It would probably be simpler to just check…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions It would probably be simpler to just check both the operands as opposed to needing the loop. I disagree. Having the loop here makes it clearer that each operand is being handled in the same way, while having them separated needlessly duplicates the code. NickGuy: > It would probably be simpler to just check both the operands as opposed to needing the loop.
		dmgreenUnsubmitted Done Reply Inline Actions It would seem simpler (and smaller) to just call performCommonVectorExtendCombine for each operand. It may need to use something like `Op0 ? Op0 : Mul->getOperand(0)`, but it would remove the need to track Changed and most of the rest of this loop. dmgreen: It would seem simpler (and smaller) to just call performCommonVectorExtendCombine for each…
// (mul x, 2^N - 1) => (sub (shl x, N), x)		// (mul x, 2^N - 1) => (sub (shl x, N), x)
// (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M)		// (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M)
APInt SCVMinus1 = ShiftedConstValue - 1;		APInt SCVMinus1 = ShiftedConstValue - 1;
APInt CVPlus1 = ConstValue + 1;		APInt CVPlus1 = ConstValue + 1;
if (SCVMinus1.isPowerOf2()) {		if (SCVMinus1.isPowerOf2()) {
ShiftAmt = SCVMinus1.logBase2();		ShiftAmt = SCVMinus1.logBase2();
AddSubOpc = ISD::ADD;		AddSubOpc = ISD::ADD;
} else if (CVPlus1.isPowerOf2()) {		} else if (CVPlus1.isPowerOf2()) {
ShiftAmt = CVPlus1.logBase2();		ShiftAmt = CVPlus1.logBase2();
		dmgreenUnsubmitted Done Reply Inline Actions This also seems to make changes without necessarily detecting that they are useful. That is probably fine here considering what is being transformed, but it can be better to represent it as "test if it will help, if so then make the change". dmgreen: This also seems to make changes without necessarily detecting that they are useful. That is…
AddSubOpc = ISD::SUB;		AddSubOpc = ISD::SUB;
} else		} else
return SDValue();		return SDValue();
} else {		} else {
// (mul x, -(2^N - 1)) => (sub x, (shl x, N))		// (mul x, -(2^N - 1)) => (sub x, (shl x, N))
// (mul x, -(2^N + 1)) => - (add (shl x, N), x)		// (mul x, -(2^N + 1)) => - (add (shl x, N), x)
APInt CVNegPlus1 = -ConstValue + 1;		APInt CVNegPlus1 = -ConstValue + 1;
APInt CVNegMinus1 = -ConstValue - 1;		APInt CVNegMinus1 = -ConstValue - 1;
▲ Show 20 Lines • Show All 3,563 Lines • ▼ Show 20 Lines	static SDValue combineSVEPrefetchVecBaseImmOff(SDNode *N, SelectionDAG &DAG,
// ...and remap the intrinsic `aarch64_sve_prf<T>_gather_scalar_offset` to		// ...and remap the intrinsic `aarch64_sve_prf<T>_gather_scalar_offset` to
// `aarch64_sve_prfb_gather_uxtw_index`.		// `aarch64_sve_prfb_gather_uxtw_index`.
SDLoc DL(N);		SDLoc DL(N);
Ops[1] = DAG.getConstant(Intrinsic::aarch64_sve_prfb_gather_uxtw_index, DL,		Ops[1] = DAG.getConstant(Intrinsic::aarch64_sve_prfb_gather_uxtw_index, DL,
MVT::i64);		MVT::i64);

return DAG.getNode(N->getOpcode(), DL, DAG.getVTList(MVT::Other), Ops);		return DAG.getNode(N->getOpcode(), DL, DAG.getVTList(MVT::Other), Ops);
}		}

		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: A comment on what we are exactly combining here. SjoerdMeijer: Nit: A comment on what we are exactly combining here.
SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,		SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: performDUPSextCombine -> performDUPSExtCombine SjoerdMeijer: Nit: performDUPSextCombine -> performDUPSExtCombine
		NickGuyAuthorUnsubmitted Done Reply Inline Actions I opted for performDUPExtCombine, as it performs both signed and zero extends NickGuy: I opted for performDUPExtCombine, as it performs both signed and zero extends
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: think the coding style is to use: auto Operand = ... when it is a pointer. Same for more declarations below. SjoerdMeijer:* Nit: think the coding style is to use: auto *Operand = ... when it is a pointer. Same for…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions None of these are pointer types, though that's good to know for future :) NickGuy: None of these are pointer types, though that's good to know for future :)
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
		dmgreenUnsubmitted Done Reply Inline Actions The llvm style guide suggests not to over-use auto like this. It just makes it more difficult telling what things are. dmgreen: The llvm style guide suggests not to over-use auto like this. It just makes it more difficult…
LLVM_DEBUG(dbgs() << "Custom combining: skipping\n");		LLVM_DEBUG(dbgs() << "Custom combining: skipping\n");
break;		break;
case ISD::ABS:		case ISD::ABS:
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Can you run clang-format on your patch? I've tried reading this function, but all these lint message makes it pretty unreadable to me. SjoerdMeijer: Can you run clang-format on your patch? I've tried reading this function, but all these lint…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions Done, sorry about that NickGuy: Done, sorry about that
return performABSCombine(N, DAG, DCI, Subtarget);		return performABSCombine(N, DAG, DCI, Subtarget);
case ISD::ADD:		case ISD::ADD:
case ISD::SUB:		case ISD::SUB:
		dmgreenUnsubmitted Done Reply Inline Actions These is no such thing as a "non-vector dup", as far as I understand. dmgreen: These is no such thing as a "non-vector dup", as far as I understand.
return performAddSubCombine(N, DCI, DAG);		return performAddSubCombine(N, DCI, DAG);
case ISD::XOR:		case ISD::XOR:
return performXorCombine(N, DAG, DCI, Subtarget);		return performXorCombine(N, DAG, DCI, Subtarget);
case ISD::MUL:		case ISD::MUL:
return performMulCombine(N, DAG, DCI, Subtarget);		return performMulCombine(N, DAG, DCI, Subtarget);
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
return performIntToFpCombine(N, DAG, Subtarget);		return performIntToFpCombine(N, DAG, Subtarget);
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
return performFpToIntCombine(N, DAG, DCI, Subtarget);		return performFpToIntCombine(N, DAG, DCI, Subtarget);
case ISD::FDIV:		case ISD::FDIV:
return performFDivCombine(N, DAG, DCI, Subtarget);		return performFDivCombine(N, DAG, DCI, Subtarget);
case ISD::OR:		case ISD::OR:
return performORCombine(N, DCI, Subtarget);		return performORCombine(N, DCI, Subtarget);
		dmgreenUnsubmitted Done Reply Inline Actions I'm not sure what this is doing. It should be sign extending from the smaller type with the correct number of vector lanes (either ExtOperand.getValueType() for a sext/zext or Operand.getOperand(1) for a SIGN_EXTEND_INREG, and something based on the mask for an AND). It them probably has to make sure that the new extend is a legal operation. dmgreen: I'm not sure what this is doing. It should be sign extending from the smaller type with the…
case ISD::AND:		case ISD::AND:
return performANDCombine(N, DCI);		return performANDCombine(N, DCI);
case ISD::SRL:		case ISD::SRL:
return performSRLCombine(N, DCI);		return performSRLCombine(N, DCI);
		dmgreenUnsubmitted Done Reply Inline Actions Create a `SDLoc DL(N)` and use it in both getNodes. This doesn't need getVTList for single types. Or {} for the operands I don't think. dmgreen: Create a `SDLoc DL(N)` and use it in both getNodes. This doesn't need getVTList for single…
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return performIntrinsicCombine(N, DCI, Subtarget);		return performIntrinsicCombine(N, DCI, Subtarget);
case ISD::ANY_EXTEND:		case ISD::ANY_EXTEND:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
return performExtendCombine(N, DCI, DAG);		return performExtendCombine(N, DCI, DAG);
case ISD::SIGN_EXTEND_INREG:		case ISD::SIGN_EXTEND_INREG:
return performSignExtendInRegCombine(N, DCI, DAG);		return performSignExtendInRegCombine(N, DCI, DAG);
case ISD::TRUNCATE:		case ISD::TRUNCATE:
return performVectorTruncateCombine(N, DCI, DAG);		return performVectorTruncateCombine(N, DCI, DAG);
case ISD::CONCAT_VECTORS:		case ISD::CONCAT_VECTORS:
return performConcatVectorsCombine(N, DCI, DAG);		return performConcatVectorsCombine(N, DCI, DAG);
case ISD::SELECT:		case ISD::SELECT:
		NickGuyAuthorUnsubmitted Done Reply Inline Actions I'm unsure as to when PerformDAGCombine is invoked. If this function generates a new DUP node, would this function then be invoked with that node? Or does this function need a bit more scaffolding to support this case? NickGuy: I'm unsure as to when PerformDAGCombine is invoked. If this function generates a new DUP node…
return performSelectCombine(N, DCI);		return performSelectCombine(N, DCI);
		dmgreenUnsubmitted Done Reply Inline Actions Detecting the extend is probably best done inside the function. It's common to do: if (SDValue Ext = performDUPExtCombine(..)) return Ext; I would personally leave out at least AssertSext and AssertZext until you at least have a test that shows them being needed. If this does SIGN_EXTEND_INREG it should probably handle the equivalent AND as well. dmgreen: Detecting the extend is probably best done inside the function. It's common to do: if…
case ISD::VSELECT:		case ISD::VSELECT:
return performVSelectCombine(N, DCI.DAG);		return performVSelectCombine(N, DCI.DAG);
case ISD::LOAD:		case ISD::LOAD:
if (performTBISimplification(N->getOperand(1), DCI, DAG))		if (performTBISimplification(N->getOperand(1), DCI, DAG))
return SDValue(N, 0);		return SDValue(N, 0);
break;		break;
case ISD::STORE:		case ISD::STORE:
return performSTORECombine(N, DCI, DAG, Subtarget);		return performSTORECombine(N, DCI, DAG, Subtarget);
▲ Show 20 Lines • Show All 1,585 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-dup-ext-scalable.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple aarch64-none-linux-gnu -mattr=+sve \| FileCheck %s

				define <vscale x 2 x i16> @dupsext_v2i8_v2i16(i8 %src, <vscale x 2 x i16> %b) {
				; CHECK-LABEL: dupsext_v2i8_v2i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 2 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 2 x i16> %broadcast.splatinsert, <vscale x 2 x i16> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i16> %broadcast.splat, %b
				ret <vscale x 2 x i16> %out
				}

				define <vscale x 4 x i16> @dupsext_v4i8_v4i16(i8 %src, <vscale x 4 x i16> %b) {
				; CHECK-LABEL: dupsext_v4i8_v4i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 4 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 4 x i16> %broadcast.splatinsert, <vscale x 4 x i16> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nsw <vscale x 4 x i16> %broadcast.splat, %b
				ret <vscale x 4 x i16> %out
				}

				define <vscale x 8 x i16> @dupsext_v8i8_v8i16(i8 %src, <vscale x 8 x i16> %b) {
				; CHECK-LABEL: dupsext_v8i8_v8i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: mov z1.h, w8
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 8 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 8 x i16> %broadcast.splatinsert, <vscale x 8 x i16> undef, <vscale x 8 x i32> zeroinitializer
				%out = mul nsw <vscale x 8 x i16> %broadcast.splat, %b
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 2 x i32> @dupsext_v2i8_v2i32(i8 %src, <vscale x 2 x i32> %b) {
				; CHECK-LABEL: dupsext_v2i8_v2i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 2 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 2 x i32> %broadcast.splatinsert, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i32> %broadcast.splat, %b
				ret <vscale x 2 x i32> %out
				}

				define <vscale x 4 x i32> @dupsext_v4i8_v4i32(i8 %src, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: dupsext_v4i8_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 4 x i32> %broadcast.splatinsert, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nsw <vscale x 4 x i32> %broadcast.splat, %b
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @dupsext_v2i8_v2i64(i8 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupsext_v2i8_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: sxtb x8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 2 x i32> @dupsext_v2i16_v2i32(i16 %src, <vscale x 2 x i32> %b) {
				; CHECK-LABEL: dupsext_v2i16_v2i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxth w8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i16 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 2 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 2 x i32> %broadcast.splatinsert, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i32> %broadcast.splat, %b
				ret <vscale x 2 x i32> %out
				}

				define <vscale x 4 x i32> @dupsext_v4i16_v4i32(i16 %src, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: dupsext_v4i16_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxth w8, w0
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = sext i16 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 4 x i32> %broadcast.splatinsert, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nsw <vscale x 4 x i32> %broadcast.splat, %b
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @dupsext_v2i16_v2i64(i16 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupsext_v2i16_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: sxth x8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i16 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 2 x i64> @dupsext_v2i32_v2i64(i32 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupsext_v2i32_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: sxtw x8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = sext i32 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nsw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 2 x i16> @dupzext_v2i8_v2i16(i8 %src, <vscale x 2 x i16> %b) {
				; CHECK-LABEL: dupzext_v2i8_v2i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xff
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 2 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 2 x i16> %broadcast.splatinsert, <vscale x 2 x i16> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i16> %broadcast.splat, %b
				ret <vscale x 2 x i16> %out
				}

				define <vscale x 4 x i16> @dupzext_v4i8_v4i16(i8 %src, <vscale x 4 x i16> %b) {
				; CHECK-LABEL: dupzext_v4i8_v4i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xff
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 4 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 4 x i16> %broadcast.splatinsert, <vscale x 4 x i16> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nuw <vscale x 4 x i16> %broadcast.splat, %b
				ret <vscale x 4 x i16> %out
				}

				define <vscale x 8 x i16> @dupzext_v8i8_v8i16(i8 %src, <vscale x 8 x i16> %b) {
				; CHECK-LABEL: dupzext_v8i8_v8i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xff
				; CHECK-NEXT: mov z1.h, w8
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i16
				%broadcast.splatinsert = insertelement <vscale x 8 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <vscale x 8 x i16> %broadcast.splatinsert, <vscale x 8 x i16> undef, <vscale x 8 x i32> zeroinitializer
				%out = mul nuw <vscale x 8 x i16> %broadcast.splat, %b
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 2 x i32> @dupzext_v2i8_v2i32(i8 %src, <vscale x 2 x i32> %b) {
				; CHECK-LABEL: dupzext_v2i8_v2i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xff
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 2 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 2 x i32> %broadcast.splatinsert, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i32> %broadcast.splat, %b
				ret <vscale x 2 x i32> %out
				}

				define <vscale x 4 x i32> @dupzext_v4i8_v4i32(i8 %src, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: dupzext_v4i8_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xff
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 4 x i32> %broadcast.splatinsert, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nuw <vscale x 4 x i32> %broadcast.splat, %b
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @dupzext_v2i8_v2i64(i8 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupzext_v2i8_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: and x8, x0, #0xff
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 2 x i32> @dupzext_v2i16_v2i32(i16 %src, <vscale x 2 x i32> %b) {
				; CHECK-LABEL: dupzext_v2i16_v2i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xffff
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i16 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 2 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 2 x i32> %broadcast.splatinsert, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i32> %broadcast.splat, %b
				ret <vscale x 2 x i32> %out
				}

				define <vscale x 4 x i32> @dupzext_v4i16_v4i32(i16 %src, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: dupzext_v4i16_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: and w8, w0, #0xffff
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%in = zext i16 %src to i32
				%broadcast.splatinsert = insertelement <vscale x 4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <vscale x 4 x i32> %broadcast.splatinsert, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
				%out = mul nuw <vscale x 4 x i32> %broadcast.splat, %b
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @dupzext_v2i16_v2i64(i16 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupzext_v2i16_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: and x8, x0, #0xffff
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i16 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 2 x i64> @dupzext_v2i32_v2i64(i32 %src, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: dupzext_v2i32_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%in = zext i32 %src to i64
				%broadcast.splatinsert = insertelement <vscale x 2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <vscale x 2 x i64> %broadcast.splatinsert, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
				%out = mul nuw <vscale x 2 x i64> %broadcast.splat, %b
				ret <vscale x 2 x i64> %out
				}

llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple aarch64-none-linux-gnu \| FileCheck %s

				; Supported combines

				define <8 x i16> @dupsext_v8i8_v8i16(i8 %src, <8 x i8> %b) {
				; CHECK-LABEL: dupsext_v8i8_v8i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.8b, w0
				; CHECK-NEXT: smull v0.8h, v1.8b, v0.8b
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%ext.b = sext <8 x i8> %b to <8 x i16>
				%broadcast.splatinsert = insertelement <8 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <8 x i16> %broadcast.splatinsert, <8 x i16> undef, <8 x i32> zeroinitializer
				%out = mul nsw <8 x i16> %broadcast.splat, %ext.b
				ret <8 x i16> %out
				}

				define <8 x i16> @dupzext_v8i8_v8i16(i8 %src, <8 x i8> %b) {
				; CHECK-LABEL: dupzext_v8i8_v8i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.8b, w0
				; CHECK-NEXT: umull v0.8h, v1.8b, v0.8b
				; CHECK-NEXT: ret
				entry:
				%in = zext i8 %src to i16
				%ext.b = zext <8 x i8> %b to <8 x i16>
				%broadcast.splatinsert = insertelement <8 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <8 x i16> %broadcast.splatinsert, <8 x i16> undef, <8 x i32> zeroinitializer
				%out = mul nuw <8 x i16> %broadcast.splat, %ext.b
				ret <8 x i16> %out
				}

				define <4 x i32> @dupsext_v4i16_v4i32(i16 %src, <4 x i16> %b) {
				; CHECK-LABEL: dupsext_v4i16_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.4h, w0
				; CHECK-NEXT: smull v0.4s, v1.4h, v0.4h
				; CHECK-NEXT: ret
				entry:
				%in = sext i16 %src to i32
				%ext.b = sext <4 x i16> %b to <4 x i32>
				%broadcast.splatinsert = insertelement <4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
				%out = mul nsw <4 x i32> %broadcast.splat, %ext.b
				ret <4 x i32> %out
				}

				define <4 x i32> @dupzext_v4i16_v4i32(i16 %src, <4 x i16> %b) {
				; CHECK-LABEL: dupzext_v4i16_v4i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.4h, w0
				; CHECK-NEXT: umull v0.4s, v1.4h, v0.4h
				; CHECK-NEXT: ret
				entry:
				%in = zext i16 %src to i32
				%ext.b = zext <4 x i16> %b to <4 x i32>
				%broadcast.splatinsert = insertelement <4 x i32> undef, i32 %in, i32 0
				%broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
				%out = mul nuw <4 x i32> %broadcast.splat, %ext.b
				ret <4 x i32> %out
				}

				define <2 x i64> @dupsext_v2i32_v2i64(i32 %src, <2 x i32> %b) {
				; CHECK-LABEL: dupsext_v2i32_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.2s, w0
				; CHECK-NEXT: smull v0.2d, v1.2s, v0.2s
				; CHECK-NEXT: ret
				entry:
				%in = sext i32 %src to i64
				%ext.b = sext <2 x i32> %b to <2 x i64>
				%broadcast.splatinsert = insertelement <2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <2 x i64> %broadcast.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
				%out = mul nsw <2 x i64> %broadcast.splat, %ext.b
				ret <2 x i64> %out
				}

				define <2 x i64> @dupzext_v2i32_v2i64(i32 %src, <2 x i32> %b) {
				; CHECK-LABEL: dupzext_v2i32_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: dup v1.2s, w0
				; CHECK-NEXT: umull v0.2d, v1.2s, v0.2s
				; CHECK-NEXT: ret
				entry:
				%in = zext i32 %src to i64
				%ext.b = zext <2 x i32> %b to <2 x i64>
				%broadcast.splatinsert = insertelement <2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <2 x i64> %broadcast.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
				%out = mul nuw <2 x i64> %broadcast.splat, %ext.b
				ret <2 x i64> %out
				}

				; Unsupported combines

				define <2 x i16> @dupsext_v2i8_v2i16(i8 %src, <2 x i8> %b) {
				; CHECK-LABEL: dupsext_v2i8_v2i16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: shl v0.2s, v0.2s, #24
				; CHECK-NEXT: sshr v0.2s, v0.2s, #24
				; CHECK-NEXT: dup v1.2s, w8
				; CHECK-NEXT: mul v0.2s, v1.2s, v0.2s
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%ext.b = sext <2 x i8> %b to <2 x i16>
				%broadcast.splatinsert = insertelement <2 x i16> undef, i16 %in, i16 0
				%broadcast.splat = shufflevector <2 x i16> %broadcast.splatinsert, <2 x i16> undef, <2 x i32> zeroinitializer
				%out = mul nsw <2 x i16> %broadcast.splat, %ext.b
				ret <2 x i16> %out
				}

				define <2 x i64> @dupzext_v2i16_v2i64(i16 %src, <2 x i16> %b) {
				; CHECK-LABEL: dupzext_v2i16_v2i64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: movi d1, #0x00ffff0000ffff
				; CHECK-NEXT: and v0.8b, v0.8b, v1.8b
				; CHECK-NEXT: ushll v0.2d, v0.2s, #0
				; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
				; CHECK-NEXT: and x8, x0, #0xffff
				; CHECK-NEXT: fmov x10, d0
				; CHECK-NEXT: mov x9, v0.d[1]
				; CHECK-NEXT: mul x10, x8, x10
				; CHECK-NEXT: mul x8, x8, x9
				; CHECK-NEXT: fmov d0, x10
				; CHECK-NEXT: mov v0.d[1], x8
				; CHECK-NEXT: ret
				entry:
				%in = zext i16 %src to i64
				%ext.b = zext <2 x i16> %b to <2 x i64>
				%broadcast.splatinsert = insertelement <2 x i64> undef, i64 %in, i64 0
				%broadcast.splat = shufflevector <2 x i64> %broadcast.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
				%out = mul nuw <2 x i64> %broadcast.splat, %ext.b
				ret <2 x i64> %out
				}

				; dupsext_v4i8_v4i16
				; dupsext_v2i8_v2i32
				; dupsext_v4i8_v4i32
				; dupsext_v2i8_v2i64
				; dupsext_v2i16_v2i32
				; dupsext_v2i16_v2i64
				; dupzext_v2i8_v2i16
				; dupzext_v4i8_v4i16
				; dupzext_v2i8_v2i32
				; dupzext_v4i8_v4i32
				; dupzext_v2i8_v2i64
				; dupzext_v2i16_v2i32
				; dupzext_v2i16_v2i64

				; Unsupported states

				define <8 x i16> @nonsplat_shuffleinsert(i8 %src, <8 x i8> %b) {
				; CHECK-LABEL: nonsplat_shuffleinsert:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sxtb w8, w0
				; CHECK-NEXT: sshll v0.8h, v0.8b, #0
				; CHECK-NEXT: dup v1.8h, w8
				; CHECK-NEXT: mul v0.8h, v1.8h, v0.8h
				; CHECK-NEXT: ret
				entry:
				%in = sext i8 %src to i16
				%ext.b = sext <8 x i8> %b to <8 x i16>
				%broadcast.splatinsert = insertelement <8 x i16> undef, i16 %in, i16 1
				%broadcast.splat = shufflevector <8 x i16> %broadcast.splatinsert, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1>
				%out = mul nsw <8 x i16> %broadcast.splat, %ext.b
				dmgreenUnsubmitted Done Reply Inline Actions I think this could still count as a splat, as the elements are undef (can happily take any value, including 0). It's probably fine to use a extra parameter and something like an ext shuffle: %broadcast.splat = shufflevector <8 x i16> %c, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1> That should hopefully test things like it not being a splat and the insert not existing. dmgreen: I think this could still count as a splat, as the elements are undef (can happily take any…
				ret <8 x i16> %out
				}

				define <8 x i16> @missing_insert(<8 x i8> %b) {
				; CHECK-LABEL: missing_insert:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sshll v0.8h, v0.8b, #0
				; CHECK-NEXT: ext v1.16b, v0.16b, v0.16b, #4
				; CHECK-NEXT: mul v0.8h, v1.8h, v0.8h
				; CHECK-NEXT: ret
				entry:
				%ext.b = sext <8 x i8> %b to <8 x i16>
				%broadcast.splat = shufflevector <8 x i16> %ext.b, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1>
				%out = mul nsw <8 x i16> %broadcast.splat, %ext.b
				ret <8 x i16> %out
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 314904

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/aarch64-dup-ext-scalable.ll

llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll

[AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))
ClosedPublic