Page MenuHomePhabricator

[AArch64][SVE] Use SVE for VLS fcopysign for wide vectors

Authored by DavidTruby on Jun 27 2022, 7:02 AM.



Currently fcopysign for VLS vectors lowers through NEON even when the
vector width is wider than a NEON vector, causing bad codegen as the
vectors are split. This patch causes SVE to be used for these vectors
instead, giving much better codegen on wide VLS vectors.

Diff Detail

Unit TestsFailed

60,110 msx64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -shared-libasan -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/scariness_score_test.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxDynamicConfig/TestCases/Output/scariness_score_test.cpp.tmp
60,160 msx64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/scariness_score_test.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/scariness_score_test.cpp.tmp
60,050 msx64 debian > MLIR.Examples/standalone::test.toy
Script: -- : 'RUN: at line 1'; /usr/bin/cmake /var/lib/buildkite-agent/builds/llvm-project/mlir/examples/standalone -G "Ninja" -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/var/lib/buildkite-agent/builds/llvm-project/build/lib/cmake/mlir -DLLVM_USE_LINKER=lld

Event Timeline

DavidTruby created this revision.Jun 27 2022, 7:02 AM
Herald added a project: Restricted Project. · View Herald Transcript
DavidTruby requested review of this revision.Jun 27 2022, 7:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2022, 7:02 AM

FYI, if I add -mattr=+sve2 to your test arguments, I get:

LLVM ERROR: Cannot select: t17: v16i16 = AArch64ISD::BSP t43, t35, t32

Fix expansion for VLS on SVE2

When checking the output for SVE2 I see no difference, which means we're missing out on the BSL optimisation we get for scalable vectors. I think this is because you're handling the fixed->scalable lowering too late. I think you really need to edit LowerFCOPYSIGN to first convert the fixed length ISD::FCOPYSIGN to a scalable one, then let the existing scalable vector code decide how best to lower it.


Rather than have this dangling there's a large ordered/sorted block further down.


I think you mean VT.isScalableVector() here. However...

Given this bug fix it makes me wonder if the following code was ever excised before this patch? Which given my SVE2 comment I'm think we can in fact keep the original code and just remove the fixedSVEVectorVT code?

Matt added a subscriber: Matt.Jun 28 2022, 2:03 PM

Rework patch to use VLA lowering for the VLS types.

paulwalker-arm added inline comments.Jun 30 2022, 9:47 AM

This doesn't look safe with respect to the extend/rounding code just below. When faced with differing types the result from both convertToScalableVector called will be a type of the same size. However their element counts will be different. For example take the case:

fcopysign v8f64, v8f32

this will resulting in:

In1 = nxv2f64
In2 = nxv4f32

which I doubt the remaining logic will handle properly. The most likely affect being a getNode assert firing for invalid operands.

My guess is that you're not seeing this because In1 and In2 always have the same type and indeed I couldn't immediate see a way to exercise this logic. I think this means your "mixtype" tests are likely exercising nothing new and are redundant. This is likely also true for you original patch when you added the initial scalable vector support. If they are not exercising this code as I suspect then you either need to rewrite them or just remove them if there's no actually route to test this logic.

Personally I think the safest route is to simply rewrite the fixed length fcopysign into a scalable vector one after any necessary extending/rounding of the input has taken place.

For what it's worth I also think the use of FP_EXTEND/FP_ROUND is not the most efficient way to get the sign bits to align but that can be changed later.


Isn't this original code now fine and you instead just need to remove the following

// Don't expand for NEON
if (VT.isFixedLengthVector())
  return SDValue();

block because that is covered by the !VT.isScalableVector() check?

DavidTruby updated this revision to Diff 442061.Jul 4 2022, 3:50 AM

Move VLS handling after ROUND/EXTEND

DavidTruby added inline comments.Jul 4 2022, 3:52 AM

I believe I've corrected this now; I think you're right that the inputs will always be the same type anyway though. I agree that it is safer to leave the handling in just in case that does get triggered.
I think it's better to leave the mixed type tests in as is, just in case something changes in future and the types coming into this function could be different we want to make sure we don't regress in that case.

peterwaller-arm accepted this revision.Jul 4 2022, 4:14 AM

Looking generally good but I see some possible minor improvements/cleanup.


Nit: Does isFixedSVE want to move down with the use?


Is this line necessary or could it be pushed up? At a glance it appears it should already be an integer VT derived from VT. Same question for the VT assignment.

This revision is now accepted and ready to land.Jul 4 2022, 4:14 AM
DavidTruby added inline comments.Jul 4 2022, 5:24 AM

From 7593-4: VT and IntVT will be scalable containers for the fixed length vector types. Here we need to get the original VTs back.

peterwaller-arm requested changes to this revision.Jul 18 2022, 5:50 AM

Requesting changes to deal with the mixed-type combine/tests, since we have found a case where the types can be different.

This revision now requires changes to proceed.Jul 18 2022, 5:50 AM

Add flag to test FCOPYSIGN nodes with differing argument types.

This patch now depends on D130370 as a result.

paulwalker-arm added inline comments.Jul 28 2022, 4:30 PM
15405 ↗(On Diff #448315)

What about return EnableVectorFcopysignExtendRound;?

3597 ↗(On Diff #448315)

By this point we know the result type is legal because results are legalised before operands. What's important here is the result type remains legal after splitting the operands. Given the result and first operands have the same type this means ensuring the types of LHSLo and LHSHi are legal after splitting.

There's a function GetSplitDestVTs which returns the types expected from splitting. I mention this because I think it's better to query the expected types are legal before performing the actual splitting.

DavidTruby added inline comments.Jul 28 2022, 4:36 PM
3597 ↗(On Diff #448315)

Ah ok I think I was considering this wrong, I thought that the result type of the concat (which is the result type of the original FCOPYSIGN) needed to be legal for us to do the transform

If that's already legal, is there a problem? Is there a case where splitting an already legal vector in two would make a vector illegal? (genuine question I'm not sure when this would pop up)

Or do we need RHSLo to be legal?

paulwalker-arm added inline comments.Jul 28 2022, 4:51 PM
3597 ↗(On Diff #448315)

You can have multiple legal types for the same vector element type. For NEON v4f32 and v2f32 are legal. So it is possible for the result type to be legal and yet still be legal after splitting. Likewise v1f32 is not legal for NEON and so it is possible to enter with a legal type that would become illegal when split.

For the former case we can split the operation in two as you've done. For the latter we're better reverting to the original code path of calling UnrollVector. So generally what you've done is fine, it is just you're checking the wrong type (i.e. N's result type rather than the expected result type of the new FCOPYSIGN operations). Plus my comment that you probably want to use GetSplitDestVTs so you only call SplitVector for the cases that are safe.

Fix validity check for FCOPYSIGN legalization

paulwalker-arm added inline comments.Aug 1 2022, 10:14 AM
138 ↗(On Diff #448640)

Up to you but I think EnableVectorFCopySignExtendRound looks better.

140 ↗(On Diff #448640)


3612 ↗(On Diff #448640)


3614 ↗(On Diff #448640)



Not new but can this be removed? as it can never happen given the SrcVT.bitsLT/SrcVT.bitsGT code above.


This can be assumed, plus getContainerForFixedLengthVector will ensure the type is legal anyway.


Bookending the fixed length lowering like this has pitfalls and can complicate the code. It's better to just rewrite the fixed length operations using scalable vector types and then let the scalable vector lowering handle any complexity. Towards the start of the function you can do:

EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);
In1 = convertToScalableVector(DAG, ContainerVT, In1);
In2 = convertToScalableVector(DAG, ContainerVT, In2);
Res = getNode(ISD::FCOPYSIGN, ContainerVT , In1, In2)
return convertFromScalableVector(DAG, ContainerVT, Res);

This way it doesn't matter how complicated the scalable vector lowering gets. Doing this also means you no longer need sve2-fixed-length-fcopysign.ll because there's nothing SVE2 special about the lowering code you've added (i.e. the original sve2-fcopysign.ll tests are good enough to protect that functionality).

DavidTruby updated this revision to Diff 449254.Aug 2 2022, 3:58 AM

Changed fixed-length lowering to rely on scalable lowering.
Removed redundant code.

paulwalker-arm accepted this revision.Aug 3 2022, 10:32 AM

Documentation for combiner-vector-fcopysign-extend-round needs updating but otherwise looks good.

141 ↗(On Diff #449254)

Please drop this part of the documentation. Although this is why you've added the flag, it is not the only reason somebody might want to use it (i.e. somebody might actually want to enable the optimisation).

3615–3618 ↗(On Diff #449254)

You could just return DAG.getNode(....


Bogus blank line.

peterwaller-arm accepted this revision.Aug 9 2022, 4:00 AM
This revision is now accepted and ready to land.Aug 9 2022, 4:00 AM