Page MenuHomePhabricator

[AArch64][SelectionDAG] Add legalization for widening LOAD/MLOAD.
Needs ReviewPublic

Authored by efriedma on Jul 14 2021, 5:32 PM.

Details

Summary

By the magic of masked loads, a widened MLOAD is almost identical to the original MLOAD.

Need to handle a few more INSERT_SUBVECTOR legalization cases to avoid crashing on the testcase.

The code for computing the mask is unfortunately not very efficient; maybe we need a target-independent version of whilelo? Or a DAGCombine to form whilelo?

Diff Detail

Unit TestsFailed

TimeTest
3,190 msx64 debian > libarcher.critical::critical.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c
2,900 msx64 debian > libarcher.races::critical-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c
2,980 msx64 debian > libarcher.races::lock-nested-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c
3,280 msx64 debian > libarcher.races::lock-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-unrelated.c
3,070 msx64 debian > libarcher.races::parallel-simple.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/parallel-simple.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/parallel-simple.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/parallel-simple.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/parallel-simple.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/parallel-simple.c
View Full Test Results (16 Failed)

Event Timeline

efriedma created this revision.Jul 14 2021, 5:32 PM
efriedma requested review of this revision.Jul 14 2021, 5:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2021, 5:32 PM
sdesmalen added inline comments.Jul 19 2021, 1:14 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4056

Do we also want to do this for other element-counts that are not a power of 2? (e.g. <vscale x 6 x i8>)

Was this code intended to work for <vscale x 1 x eltty> ? (I tried that a few days ago with this patch and ran into some failures. I didn't really investigate further though)

4057

nit: VT.isKnownMultipleOf(2)

4144

Should MemoryLocation::UnknownSize only be used for the scalable case?

efriedma added inline comments.Jul 19 2021, 1:34 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4056

Should work for 1, I guess? Not sure I actually tried it. But note we only support a very limited set of operations. See also D105591.

For <vscale x 6 x i8>, we can use a different sequence: generate extloads for <vscale x 2 x i64> and <vscale x 4 x i32>, and merge them together. GenWidenVectorLoads should handle that in theory; not sure how well it works in practice.

I'll add more testcases, in any case.

4144

There's a general restriction on memoperands in selectiondag: the size of the operand must be at least the size of the memory VT. I didn't really want to think about precisely what MemoryLocation we could be using instead...

efriedma updated this revision to Diff 359931.Jul 19 2021, 3:02 PM

More tests. Make "<vscale x 1 x i32>" loads work.

Matt added a subscriber: Matt.Jul 20 2021, 6:48 AM

Apologies for the delay in reviewing this, it has been on my to-do list for quite a while!

Thanks for adding support for <vscale x 1 x eltty> types and adding a test for widening the even-numbered ElementCounts (e.g. <vscale x 6 x i16>), it's nice to see that this works too.

Other than some minor comments, the patch looks mostly fine to me.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4881–4883

nit: Is/would this be the same as:

SDValue Op0 = N->getOperand(0);
TLI.getTypeToTransformTo(*DAG.getContext(), Op0.getValueType());

?

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4011

This code is identical to the block on lines 4022-4026, with the exception of s/NumElements/MinNumElements. Can you align the implementation in the two blocks and remove the if (VT.isScalableVector())? (and then move the report_fatal_error to line 4027)

4017

nit: unnecessary curly braces.

4056

Is the FIXME still relevant?

For <vscale x 6 x i8>, we can use a different sequence: generate extloads for <vscale x 2 x i64> and <vscale x 4 x i32>, and merge them together. GenWidenVectorLoads should handle that in theory; not sure how well it works in practice.
I'll add more testcases, in any case.

Nice. From what I can see from the test you've added, <vscale x 6 x i16> is handled correctly by GenWidenVectorLoads.

4062

nit: can you move WidenVT closer to where it's used? (above line 4079)

llvm/test/CodeGen/AArch64/sve-split-load.ll
72

sve-split-load.ll may no longer be the best name for this file given that these new tests are about widening?

efriedma added inline comments.Aug 3 2021, 1:40 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4881–4883

No, it's not the same.

We need to extend the element VT to the element VT of the result type. For SVE, the resulting vector type is usually promotable. If it is, we deal with it in PromoteIntOp_CONCAT_VECTORS.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4056

Yes, it's still an issue. The constant "2" is specific to SVE.

The question we need to figure out here is, will GenWidenVectorLoads succeed (and be profitable)? For SVE, it should succeed for any multiple of 2, because we can use 64-bit extending loads. Not sure about profitability for SVE; the generated code is a bit messy, but the messy bits are all invariant (and probably less messy once we start using whilelo optimally).

Other vector instruction sets might have different restrictions.