Page MenuHomePhabricator

[AArch64][SVE] Add SVE2 intrinsics for bit permutation & table lookup

Authored by kmclaughlin on Feb 20 2020, 9:43 AM.



Implements the following intrinsics:

  • @llvm.aarch64.sve.bdep.x
  • @llvm.aarch64.sve.bext.x
  • @llvm.aarch64.sve.bgrp.x
  • @llvm.aarch64.sve.tbl2
  • @llvm.aarch64.sve.tbx

The SelectTableSVE2 function in this patch is used to select the TBL2
intrinsic & ensures that the vector registers allocated are consecutive.

Diff Detail

Event Timeline

kmclaughlin created this revision.Feb 20 2020, 9:43 AM
Herald added a project: Restricted Project. · View Herald Transcript

Cheers for working on this @kmclaughlin!


What does _x mean here?

1222 ↗(On Diff #245676)

DL ;-)

3592 ↗(On Diff #245676)

NumVecs seems be always 2 in this patch. Will we need this to work for other values in the future too?

[Nit] 2 is a bit of a magic number here. What about 2 -> /*NumVecs=*/2


AFAIK, -asm-verbose=0 is not currently needed here (and you don't use it in the other test). There are 2 options:

  • Leave -asm-verbose=0 (guarantees that there are no comments in assembly) and additionally decorate every function that you define with nounwind (guarantees that no CFI directives are added). This way you can safely replace every instance of CHECK with CHECK-NEXT.
  • Remove -asm-verbose=0 and leave things as they are.
  • Removed NumVecs parameter from SelectTableSVE2 as the value is always the same (2)
  • Removed unnecessary -asm-verbose=0 from the RUN line of sve2-intrinsics-bit-permutation.ll
kmclaughlin marked 4 inline comments as done.Feb 21 2020, 6:49 AM

Thanks for reviewing this, @andwar!


_x indicates that this is an unpredicated intrinsic.

3592 ↗(On Diff #245676)

I agree that it's not very clear what 2 is used for here. As NumVecs will always be the same value for the tbl2 intrinsic and SelectTableSVE2 is unlikely to be used for anything else, I've removed it from the list of parameters & added a comment there to explain the value used.

sdesmalen added inline comments.Feb 21 2020, 9:34 AM
1226 ↗(On Diff #245835)

nit: You can just as well inline this value now.

1229 ↗(On Diff #245835)

nit: given that we know NumVecs == 2, you can write SmallVector<SDVDalue, 2>.
nit: How about N->ops().slice(1, 2) ?

1235 ↗(On Diff #245835)

nit: Maybe just:

ReplaceNode(N, CurDAG->getMachineNode(Opc, DL, VT, { RegSeq, N->getOperand(NumVecs + 1) });

We should test this with operands that are not already consecutive. %a and %b will come in as z0 and z1 by definition of the calling convention.
By adding a %dummy in between %a and %b, you can check that a mov is inserted to ensure both registers are consecutive.

efriedma added inline comments.Feb 21 2020, 10:13 AM
1220 ↗(On Diff #245835)

Is it possible to write this as a TableGen pattern? We manage for other variants of tbl (for example, ).

Addressed review comments:

  • Removed SelectTableSVE2 from AArch64ISelDAGToDAG.cpp and added tablegen patterns for the tbl2 intrinsic
  • Updated tests to use operands that are not consecutive to ensure that the result is still two consecutive registers
This revision is now accepted and ready to land.Feb 25 2020, 3:09 PM
This revision was automatically updated to reflect the committed changes.

Thanks for reviewing this, @sdesmalen & @efriedma!