This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Zero-overhead transfer between Neon and SVE registers
AbandonedPublic

Authored by peterwaller-arm on Jul 19 2021, 5:00 AM.

Details

Summary

A pattern for moving data from a Neon ACLE type into an SVE ACLE type
involves extracting the two double-lanes of the Neon register and
inserting them into an SVE register using two DUPs with VL1 and VL2.

This must compile to a NOP.

To achieve this, this patch adds support in DAGCombine to support the
INSERT_VECTOR_ELT => BUILD_VECTOR combine. Since BUILD_VECTOR does not
support scalable vectors, the insertions are pushed into a fixed
BUILD_VECTOR through an INSERT_SUBVECTOR to make it scalable again.

With this DAGCombine in place, existing BUILD_VECTOR combines are able
neatly optimize away bitcast/extractelement/shuffle etc.

Since not all Scalable vector types are supported for INSERT_SUBVECTOR,
I introduce a TargetLoweringInfo::isInsertSubvectorLegal to query
whether to perform the combine.

Two dup => insertelement patterns are added in instCombineSVEDup:

(dup vec VL1 elem0)
=> (insertelement vec elem0 0)

(dup (dup vec VL2 elem1) VL1 elem0)
=> (insertelement (insertelement vec elem1 1) elem0 0)

... which enable the BUILD_VECTOR optimization to work.

Reference:

"Move data between Advanced SIMD (Neon) and SVE ACLE types"
https://developer.arm.com/documentation/ka004612/latest
KA004612

Diff Detail

Event Timeline

peterwaller-arm requested review of this revision.Jul 19 2021, 5:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2021, 5:00 AM
peterwaller-arm planned changes to this revision.Jul 19 2021, 5:04 AM

BRB later today with fixes. Early comments welcomed on the approach.

llvm/test/CodeGen/AArch64/dag-combine-insert-elt.ll
13

Accidental double-insert-into-undef here.

26

And here.

  • Fix dag-combine-insert-elt.ll tests
  • Remove extraneous blankline

Marking inlines done.

Can you split the instcombine changes into a separate commit? Or are they tied together in some way I'm missing?

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-dup.ll
2

Please regenerate in a separate commit.

david-arm added inline comments.Jul 20 2021, 12:12 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
647

Can use getVectorMinNumElements here I think.

peterwaller-arm marked an inline comment as done.Jul 20 2021, 1:49 AM

Can you split the instcombine changes into a separate commit? Or are they tied together in some way I'm missing?

Happy to split them as appropriate. However, I seek feedback on the overall approach. I believe that the instcombine changes on their own constitute a regression, which is why I kept the changes in a single patch. I will hold off splitting for a moment since I want to make sure the approach is as good as we can get. Any advice would be appreciated.

One criticism I have received surrounds the introduced TLI query isInsertSubvectorLegal, which seems inelegant. The problem is that BUILD_VECTOR creation is a generic DAGCombine, so I don't know what INSERT_SUBVECTORs can be lowered by the backend. Unfortunately <vscale x 2 x half> is marked legal, so an isTypeLegal or isOperationLegalOrCustom returns 'true', but the backend is incapable of lowering it. So one alternative solution might be to improve the INSERT_SUBVECTOR lowering.

The method as it is produces good patterns [such as (insert_subvector undef (scalar_to_vector))] but it's not possible to match them in TableGen AIUI because of the mix of scalable/fixed types, and I wasn't yet able to find something to match those patterns on as a DAGCombine which didn't result in an infinite loop. So I guess the next stop is to improve the behaviour in LowerOperation.

llvm/lib/Target/AArch64/AArch64ISelLowering.h
647

Discussed offline -- getVectorMinNumElements doesn't make sense as it would depend on the input type whereas the test is meant to match the assert in DAGToDAG insertSubReg.

peterwaller-arm planned changes to this revision.Jul 20 2021, 9:12 AM
peterwaller-arm marked an inline comment as done.

An alternative approach is being considered. Feel free to resign as reviewers if you want to remove this from your queue and I will re-request in the future as appropriate.

peterwaller-arm abandoned this revision.Oct 26 2021, 4:00 AM

Thanks everyone who helped with this.

Instead of the approach in this review, the new intent is to add intrinsics to the Arm C Language Extensions (ACLE) to directly allow the user to perform this conversion through an intrinsic which will directly resolve to insert/extract subvector.

The proposal for that can be found here: https://github.com/ARM-software/acle/pull/72