Page MenuHomePhabricator

[AArch64][SVE] Zero-overhead transfer between Neon and SVE registers
Changes PlannedPublic

Authored by peterwaller-arm on Jul 19 2021, 5:00 AM.

Details

Summary

A pattern for moving data from a Neon ACLE type into an SVE ACLE type
involves extracting the two double-lanes of the Neon register and
inserting them into an SVE register using two DUPs with VL1 and VL2.

This must compile to a NOP.

To achieve this, this patch adds support in DAGCombine to support the
INSERT_VECTOR_ELT => BUILD_VECTOR combine. Since BUILD_VECTOR does not
support scalable vectors, the insertions are pushed into a fixed
BUILD_VECTOR through an INSERT_SUBVECTOR to make it scalable again.

With this DAGCombine in place, existing BUILD_VECTOR combines are able
neatly optimize away bitcast/extractelement/shuffle etc.

Since not all Scalable vector types are supported for INSERT_SUBVECTOR,
I introduce a TargetLoweringInfo::isInsertSubvectorLegal to query
whether to perform the combine.

Two dup => insertelement patterns are added in instCombineSVEDup:

(dup vec VL1 elem0)
=> (insertelement vec elem0 0)

(dup (dup vec VL2 elem1) VL1 elem0)
=> (insertelement (insertelement vec elem1 1) elem0 0)

... which enable the BUILD_VECTOR optimization to work.

Reference:

"Move data between Advanced SIMD (Neon) and SVE ACLE types"
https://developer.arm.com/documentation/ka004612/latest
KA004612

Diff Detail

Unit TestsFailed

TimeTest
2,610 msx64 debian > libarcher.barrier::barrier.c
Script: -- : 'RUN: at line 14'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/barrier/barrier.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/barrier/Output/barrier.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/barrier/Output/barrier.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/barrier/Output/barrier.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/barrier/barrier.c
2,880 msx64 debian > libarcher.critical::critical.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c
2,610 msx64 debian > libarcher.parallel::parallel-simple2.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple2.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple2.c
2,890 msx64 debian > libarcher.races::critical-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c
2,800 msx64 debian > libarcher.races::lock-nested-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c
View Full Test Results (18 Failed)

Event Timeline

peterwaller-arm requested review of this revision.Jul 19 2021, 5:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2021, 5:00 AM
peterwaller-arm planned changes to this revision.Jul 19 2021, 5:04 AM

BRB later today with fixes. Early comments welcomed on the approach.

llvm/test/CodeGen/AArch64/dag-combine-insert-elt.ll
14

Accidental double-insert-into-undef here.

27

And here.

  • Fix dag-combine-insert-elt.ll tests
  • Remove extraneous blankline

Marking inlines done.

Can you split the instcombine changes into a separate commit? Or are they tied together in some way I'm missing?

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-dup.ll
1

Please regenerate in a separate commit.

david-arm added inline comments.Jul 20 2021, 12:12 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
647

Can use getVectorMinNumElements here I think.

peterwaller-arm marked an inline comment as done.Jul 20 2021, 1:49 AM

Can you split the instcombine changes into a separate commit? Or are they tied together in some way I'm missing?

Happy to split them as appropriate. However, I seek feedback on the overall approach. I believe that the instcombine changes on their own constitute a regression, which is why I kept the changes in a single patch. I will hold off splitting for a moment since I want to make sure the approach is as good as we can get. Any advice would be appreciated.

One criticism I have received surrounds the introduced TLI query isInsertSubvectorLegal, which seems inelegant. The problem is that BUILD_VECTOR creation is a generic DAGCombine, so I don't know what INSERT_SUBVECTORs can be lowered by the backend. Unfortunately <vscale x 2 x half> is marked legal, so an isTypeLegal or isOperationLegalOrCustom returns 'true', but the backend is incapable of lowering it. So one alternative solution might be to improve the INSERT_SUBVECTOR lowering.

The method as it is produces good patterns [such as (insert_subvector undef (scalar_to_vector))] but it's not possible to match them in TableGen AIUI because of the mix of scalable/fixed types, and I wasn't yet able to find something to match those patterns on as a DAGCombine which didn't result in an infinite loop. So I guess the next stop is to improve the behaviour in LowerOperation.

llvm/lib/Target/AArch64/AArch64ISelLowering.h
647

Discussed offline -- getVectorMinNumElements doesn't make sense as it would depend on the input type whereas the test is meant to match the assert in DAGToDAG insertSubReg.

peterwaller-arm planned changes to this revision.Jul 20 2021, 9:12 AM
peterwaller-arm marked an inline comment as done.

An alternative approach is being considered. Feel free to resign as reviewers if you want to remove this from your queue and I will re-request in the future as appropriate.