Page MenuHomePhabricator

[AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))
ClosedPublic

Authored by NickGuy on Nov 11 2020, 6:22 AM.

Details

Summary

Performing this rearrangement allows for existing patterns
to match against cases where operands are hoisted up to a loop
header (by LICM), allowing for optimisations such as umull/smull
codegen, while still allowing loop-invariant values to be hoisted
out of the loop afer other optimisations.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
dmgreen added inline comments.Nov 11 2020, 2:02 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15524

I'm not sure what this is doing. It should be sign extending from the smaller type with the correct number of vector lanes (either ExtOperand.getValueType() for a sext/zext or Operand.getOperand(1) for a SIGN_EXTEND_INREG, and something based on the mask for an AND). It them probably has to make sure that the new extend is a legal operation.

NickGuy updated this revision to Diff 308966.Dec 2 2020, 7:48 AM
NickGuy marked 6 inline comments as done.
NickGuy retitled this revision from [AArch64] Rearrange (dup(sext/zext)) to (sext/zext(dup)) to [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup)).

Addressed comments, and changed to be more targeted. The rearrangement will only be performed if the pattern feeds into a mul instruction, and only if the type combination is valid for smull/umull folding.

llvm/test/CodeGen/AArch64/aarch64-matrix-smull.ll
4 ↗(On Diff #304529)

I've added some tests testing the behaviour for different types and sizes (all generated, so the IR is identical apart from the types). I've omitted support for scalable types, as I encountered some issues when testing them. I plan to make progress with the fixed types first, then revisit scalable types later (Unless it turns out that to support them, I'm missing 1 line somewhere).

NickGuy updated this revision to Diff 308975.Dec 2 2020, 8:15 AM

Removed redundant/useless debug prints that were erroneously included in the patch.

Sorry for the delay. There is a lot of code here all of a sudden. More than I expected!

Can you run clang-format on the patch to make it more readable?

NickGuy updated this revision to Diff 309844.Dec 7 2020, 2:25 AM

Can you run clang-format on the patch to make it more readable?

Done, sorry about that (Guess I should have another look at my pre-commit hooks)

NickGuy updated this revision to Diff 309909.Dec 7 2020, 7:17 AM

Fixed more formatting issues (that were apparently missed by clang-format before?)

Thanks, that makes it clearer at least.

There's quite a bit of code here. I was hoping this would be a lot simpler, and it would be good if some of this could be simplified (but a lot of this might well be needed for all the cases that are being supported?). It makes it more difficult to check all the things it's doing.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11788

There is a convenience function, DAG.getUNDEF(VT);

Probably doesn't need a lambda as a result.

11820

Debug messages are fairly uncommon in ISel Lowering. It tends to trigger a lot of times where they are not important, and the debug messages it prints already are usually enough to give a rough indication of what is going on.

This whole function probably simplifies a lot I think? Is it something like TargetType.getHalfNumVectorElementsVT == PreExtendVT and the TargetType is one of {...}? I'm not sure how the MVT::v16i8 extended to MVT::v8i16 works out, from ISel. I think because ISel is type checked, those cases would not come up even if the underlying instructions supported it? We are not optimizing smull2 as far as I understand.

11851

Do we need to check both shuffles and DUP's? It that for the i64 types that are otherwise illegal somehow?

11876

A mul should always be an integer I think?

11901–11902

This comment is very short (as in - the length of the line). It probably doesn't need the newline before it either.

11909

Multiplies (and other BinOps) only have 2 operands. It would probably be simpler to just check both the operands as opposed to needing the loop.

11918

This also seems to make changes without necessarily detecting that they are useful. That is probably fine here considering what is being transformed, but it can be better to represent it as "test if it will help, if so then make the change".

NickGuy updated this revision to Diff 311506.Dec 14 2020, 12:26 AM
NickGuy marked 5 inline comments as done.
NickGuy added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11788

Ah, thanks for pointing that out.
Lambda is removed, too.

11820

Debug messages are fairly uncommon in ISel Lowering

Removed the debug messages.

11851

No, we don't. Looks like checking for shuffles catches the same DUP cases before they become DUPs. I've removed the DUP-specific check, and have unified it down the VectorShuffle check

11876

That was a remnant of before we were restricting this to muls only. Regardless, this has now been removed.

11909

It would probably be simpler to just check both the operands as opposed to needing the loop.

I disagree. Having the loop here makes it clearer that each operand is being handled in the same way, while having them separated needlessly duplicates the code.

NickGuy updated this revision to Diff 311840.Dec 15 2020, 1:37 AM

Fixed broken commit

NickGuy updated this revision to Diff 311853.Dec 15 2020, 3:08 AM

Fixed formatting, and removed "Unsupported combines" tests as they were failing due to a separate issue (and contribute very little value to this patch)

Fixed formatting, and removed "Unsupported combines" tests as they were failing due to a separate issue (and contribute very little value to this patch)

It might be worth keeping one or two to show it doesn't do anything wrong/crash on incorrect types.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11708

Add a message explaining what this function does. It looks generally useful.

It might also be worth having this function return the PreExtendType instead, and letting the parent function do the extend to vector type. Either way is fine but it might allow the function to be used for other things, if they come up.

11732

Can remove debug here and below.

11754

Nit: Doesn't need a break like this after a return.

11757

I think this can be PostExtendType.changeVectorElementType(PreExtendType);

11763

SDNode -> SDValue

11772

I may be wrong, but is the PreExtendVT != MVT::v16i8 half of this ever used? TargetType and PreExtendVT should have the same number of vector elements I believe. And then TargetType.getScalarSizeInBits() == 2 * PreExtendVT.getScalarSizeInBits() maybe?

That might help this lambda simplify further, possibly to the point that it is easier to inline it, now that it is only used in one place.

11805

Nit: DebugLoc is usually called just DL (I think DebugLoc is already the name of the type used in Machine Instructions).

11809

Nit: It doesn't need the {} around the operands, there are overloaded methods to handle that automatically already.

11816

It may be simpler to generate the AArch64ISD::DUP directly? I'm not sure either way which is better, but it's less nodes and we know it's going to get there eventually. Be careful about illegal types though.

11909

It would seem simpler (and smaller) to just call performCommonVectorExtendCombine for each operand. It may need to use something like Op0 ? Op0 : Mul->getOperand(0), but it would remove the need to track Changed and most of the rest of this loop.

NickGuy updated this revision to Diff 312207.Dec 16 2020, 7:14 AM
NickGuy marked 12 inline comments as done.

Addressed comments, and re-added a couple of "Unsupported combines" tests.

Thanks for the updates, this is looking good.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11720

SDNode* -> SDValue

11826–11830

I think this will always produce a new value, as opposed to returning SDValue when nothing is done. It's probably best to make it something like this instead:

SDValue Op0 = performCommonVectorExtendCombine(Mul->getOperand(0), DAG);
SDValue Op1 = performCommonVectorExtendCombine(Mul->getOperand(1), DAG);
if (!Op0 && !Op1)
  return SDValue();

SDLoc DL(Mul);
return DAG.getNode(Mul->getOpcode(), DL, Mul->getValueType(0),
                   Op0 ? Op0 : Mul->getOperand(0),
                   Op1 ? Op1 : Mul->getOperand(1));
NickGuy updated this revision to Diff 312736.Dec 18 2020, 2:56 AM
NickGuy marked 2 inline comments as done.

Addressed Comments

NickGuy added inline comments.Dec 18 2020, 3:02 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11772

Not entirely sure it is, I added that check to explicitly allow the types accepted by u/smull. I've changed this now.

11816

Generating the DUP directly, while simpler at this point, failed due to this combine being performed fairly early, and the DUP not being handled as part of lowering. Given that, keeping it as shuffle_vector at this point seems simpler, as we can also benefit from the existing VECTOR_SHUFFLE->DUP lowering checks.

NickGuy updated this revision to Diff 312773.Dec 18 2020, 5:49 AM

Fixed incorrect error-handling

dmgreen added inline comments.Mon, Dec 21, 12:32 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11721

const SDValue &ExtOperand -> SDValue ExtOperand

Its also only used in one place and needn't get a variable.

11730

const SDValue &TypeOperand -> SDValue TypeOperand

11768

This doesn't seem to be checking that the shuffle/insert are actually a splat.

There is an isSplatValue method that could help, depending on if it's really checking what this wants to check.

NickGuy updated this revision to Diff 314185.Thu, Dec 31, 5:01 AM
NickGuy marked 3 inline comments as done.
NickGuy added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11768

Done, though I wasn't able to produce IR that represented a shuffle/insert that wasn't a splat, so I haven't got any tests for this case.

NickGuy updated this revision to Diff 314550.Tue, Jan 5, 2:29 AM

Added test case for non-splat shuffles

dmgreen added inline comments.Tue, Jan 5, 8:34 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11717

Although (I know..) I suggested this, it does more than what we might expect here. And there's the chance it could change in any number of weird and wonderful ways in the future. As we are relying on this being a vector shuffle with an insert element and a zero mask, I think we should check for that. I think it should be enough to test that:

  • The shuffle vector has a zero mask (ShuffleVectorSDNode(VectorShuffle)->isSplat(), getSplatIndex()==0)
  • The first first operand is an insert
  • That is inserting into lane 0.
llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
169

I think this could still count as a splat, as the elements are undef (can happily take any value, including 0).
It's probably fine to use a extra parameter and something like an ext shuffle:
%broadcast.splat = shufflevector <8 x i16> %c, <8 x i16> undef, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1>
That should hopefully test things like it not being a splat and the insert not existing.

NickGuy updated this revision to Diff 314861.Wed, Jan 6, 4:59 AM
NickGuy marked 2 inline comments as done.

Addressed comments, updating the checks and tests

dmgreen accepted this revision.Wed, Jan 6, 5:44 AM

Thanks. LGTM with a couple of suggestions.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11762

You can probably remove this now and just rely on the dyn_cast<ShuffleVectorSDNode> check below.

11772

Maybe move this comment up above the if, to avoid the misleading indentation.

11781–11787

Maybe something like this?

// Ensures the insert is inserting into lane 0
auto *Constant = dyn_cast<ConstantSDNode>(InsertLane.getNode());
if (!Constant || Constant->getZExtValue() != 0)
  return SDValue();
This revision is now accepted and ready to land.Wed, Jan 6, 5:44 AM
This revision was automatically updated to reflect the committed changes.
NickGuy marked 3 inline comments as done.
dmajor added a subscriber: dmajor.Thu, Jan 7, 7:20 AM

This commit caused a crash on our bots with a null LLVMTy at changeExtendedVectorElementType. Reduced case: https://godbolt.org/z/7ehffx

This commit caused a crash on our bots with a null LLVMTy at changeExtendedVectorElementType. Reduced case: https://godbolt.org/z/7ehffx

Thanks for bringing this to my attention, I've got a fix in review at D94234.