This is an archive of the discontinued LLVM Phabricator instance.

[AArch64-SVE]: Force generating code compatible to streaming mode for sve-fixed-length tests.
AbandonedPublic

Authored by hassnaa-arm on Oct 27 2022, 9:40 AM.

Details

Summary

Add testing files and enable streaming mode flag for:

bit-counting.ll
bitselect.ll
insert-vector-elt.ll
subvector.ll
vector-shuffle.ll
int-immediates.ll
int-minmax.ll
int-reduce.ll
trunc.ll
int-compare.ll
int-vselect.ll
mask-opt.ll
masked-scatter.ll
masked-gather.ll
fp-compares.ll
fp-extend-trunc.ll
addressing-modes.ll
fp-arith.ll
int-select.ll
log-reduce.ll
ld2-alloca.ll
    
Add needed changes to force generateing code compatible to streaming mode:
1- enable custom lowering for CTLZ and CTPOP, (needed for bit-counting.ll test).
2- enable custom lowering for insert_vector_elt, (needed for insert-vector-elt.ll test).
3- enable custom lowering for vector SETCC, (needed for subvector.ll and int-compare.ll tests).
4- enable custom lowering for SMIN, SMAX, UMIN, UMAX, (needed for int-minmax.ll and int-immediates.ll tests).
5- enable custom lowering for vecreduce_smin/smax/umin/umax/add, (needed for int-reduce).
6- enable custom lowering for truncate, (needed for trunc.ll)
7- enable custome lowering for truncStore, (needed for fp-extend-trunc.ll).
8- enable expanding setueq to avoid custom-lowering setcc to setcc_merge_zero which cause a crash while instruction     selection because there is no pattern match for it, (that is needed for fp-compares.ll)
9- disable combining OR into BSL, (needed for bit-select.ll test).
10- disable lowering interleaved load to avoid generating invalid neon intrinsic, (needed for ld2-alloca.ll).
11- use SVE OR instruction instead of NEON OR, during copying phyReg -AArch64InstrInfo::copyPhysReg-, (needed for vector-shuffle).
12- force scalarisation for masked gather/scatter, because they are not supported in streaming mode.

Diff Detail

Event Timeline

hassnaa-arm created this revision.Oct 27 2022, 9:40 AM
hassnaa-arm requested review of this revision.Oct 27 2022, 9:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 27 2022, 9:40 AM
hassnaa-arm added inline comments.Oct 27 2022, 9:56 AM
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll
10

when I uncomment this test, llc returns this error:
invalid shufflevector operands:

%ret = shufflevector <4 x i8> %op1, <4 x i8> %op2, <4 x i32> <i32 7, i32 8, i32 9, i32 10>
68

when I uncomment this test, llc returns this error:
invalid shufflevector operands:

%ret = shufflevector <2 x i16> %op1, <2 x i16> %op2, <2 x i32> <i32 3, i32 4>

Add additional test cases and remove not needed ones from subvector.ll test file.

hassnaa-arm added inline comments.Oct 27 2022, 10:25 AM
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-bitselect.ll
12

I left that comment intentionally to choose which solution is better, the current solution (disable combining or into BSP), or implement SVE lowering for the BSP pseudoinst as the comment suggest.

48

Should I append additional test cases for this test file ?
It seems that the original test file -sve-fixed-length-bitselect.ll- tests specific case (for specific size).

sdesmalen added inline comments.Oct 27 2022, 10:39 AM
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll
10

That is because elements 8, 9 and 10 are out of bounds when you concatenate %op1 and %op2 (<=> 8 elements)

The follow does work for example:

%ret = shufflevector <4 x i8> %op1, <4 x i8> %op2, <4 x i32> <i32 3, i32 4, i32 5, i32 6>
Matt added a subscriber: Matt.Oct 28 2022, 1:26 PM
hassnaa-arm marked an inline comment as done.

Adding new testing files and required changes for generating streaming-compatible code for them.
int-immediates.ll
int-minmax.ll
int-reduce.ll
int-compares.ll
trunc.ll

Add new testing files:
int-vselect.ll
mask-opt.ll
masked-scatter.ll -has problems-.
masked-gather.ll -has problems-.

hassnaa-arm added inline comments.Nov 3 2022, 6:04 AM
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-gather.ll
6

This testing file is still in progress.
It crashes because of test cases of f16.

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-scatter.ll
2

This testing file is still in progress.
It crashes because of test cases of f16.

Add new testing files.
Add masked-gather.ll and masked-scatter.ll and force scalarisation for them.

hassnaa-arm edited the summary of this revision. (Show Details)Nov 3 2022, 10:09 AM
hassnaa-arm edited the summary of this revision. (Show Details)
Matt added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
265

I wonder, would it also make sense to do that for 128-bit vectors (regardless of the streaming mode) as a (temporary?) fix for https://github.com/llvm/llvm-project/issues/56412?
@hassnaa-arm, @paulwalker-arm: Thoughts?

paulwalker-arm added inline comments.Nov 3 2022, 11:15 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
265

@Matt : That's not a bug we can actually hit though is it? I mean, you have to edit the LLVM code in order to trigger the failure case?

Matt added inline comments.Nov 3 2022, 11:21 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
265

Indeed, I'm asking in the context assuming the modification of useSVEForFixedLengthVectors--curious whether a similar fix is applicable, given how similar that modification is to the forceStreamingCompatibleSVE special case (that's the only relation to this patch). Chances are it would only be needed for half/fp16, too.

paulwalker-arm added inline comments.Nov 3 2022, 11:52 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
265

I see, then yes. Doing this will force the intrinsic to be scalarised at the IR level and thus you will not hit the failure case within code generation. 56412 isn't just about 128bit vectors though, because those work today. It's really about restricting smaller than 64bit vectors (e.g. <2 x half>) when specially targeting SVE128. That said, I'd sooner fix the underlying issue :)

Matt added inline comments.Nov 3 2022, 12:02 PM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
265

Makes sense! (Would it be fair to say that the underlying issue is somewhere in SelectionDAG and the interaction of SVE128 and smaller vector?)

paulwalker-arm added inline comments.Nov 3 2022, 1:57 PM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
265

Yes. I'm pretty sure it's a legalisation hang where we keep bouncing between two different legalisation styles.

Matt added inline comments.Nov 3 2022, 1:59 PM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
265

All right, thanks!

Add new testing files.

Add new testing files:
zip-uzp-trn.ll
optimize-ptrue.ll
int-to-fp.ll
permute-rev.ll

Add testing files for fp.

Hi @hassnaa-arm, could you rebase this off D137093 please because I'd like to see whether or not this patch fixes up the gathers and scatters present in CodeGen/AArch64/sve-streaming-mode-fixed-length-addressing-modes.ll from that patch.

I've only reviewed about 1/3 of this patch so far, since it's so big! But I'm leaving the comments I have so far ...

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
3658

Given you're trying to mov SrcReg into DstReg I think this is incorrect for two reasons:

  1. You're marking the source registers as being Defined, which isn't right since they're only being read.
  2. The second source should also be SrcReg.

i.e. something like this:

BuildMI(MBB, I, DL, get(AArch64::ORR_ZZZ))
      .addReg(AArch64::Z0 + (DestReg - AArch64::Q0), RegState::Define)
      .addReg(AArch64::Z0 + (SrcReg - AArch64::Q0))
      .addReg(AArch64::Z0 + (SrcReg - AArch64::Q0))
llvm/test/CodeGen/AArch64/sve-fixed-length-int-reduce.ll
1111–1113

There is nothing technically wrong with these changes - we're getting the same result. I just wonder if we want to change the behaviour for NEON-like vectors when not in streaming mode?

@paulwalker-arm any thoughts?

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-to-int.ll
321 ↗(On Diff #473908)

I think we can remove this test because the input vector > 256 bits.

1070 ↗(On Diff #473908)

Remove this test, since input > 256 bits?

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-scatter.ll
2402

Given we know we're just going to scalarise this operation I wonder if there is much value in producing tests for large types? Perhaps we can just ignore tests for vectors that are > 256 bits?

sdesmalen added inline comments.Nov 8 2022, 7:01 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12894

Instead of updating the OverrideNEON variable, I suspect that you actually want to do something like this:

if (SrcVT.isScalableVector() ||
    useSVEForFixedLengthVectorVT(
        SrcVT, OverrideNEON && Subtarget->useSVEForFixedLengthVectors()) ||
    useSVEForFixedLengthVectorVT(
        SrcVT, Subtarget->forceStreamingCompatibleSVE()))

so to not alter the behaviour for non-streaming fixed-length vectors. This avoids the regressions in sve-fixed-length-int-reduce.ll where the SVE variants require an additional predicate (whereas the NEON reduction operations are unpredicated and thus only 1 instruction).

hassnaa-arm marked 4 inline comments as done.

revert changes added to sve-fixed-length-int-reduce.ll

hassnaa-arm added inline comments.Nov 10 2022, 7:50 AM
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-to-int.ll
321 ↗(On Diff #473908)

I left it because the output vector is not > 256.
So, for all cases, I leave it if one of the intput/output vector is not > 256

1070 ↗(On Diff #473908)

I left it because the output vector is not > 256.
So, for all cases, I leave it if one of the intput/output vector is not > 256

Add new testing files

hassnaa-arm edited the summary of this revision. (Show Details)Nov 11 2022, 9:18 AM
hassnaa-arm added a reviewer: paulwalker-arm.

Add new testing file: addressing-modes.ll

hassnaa-arm edited the summary of this revision. (Show Details)Nov 14 2022, 4:15 AM

Adding new testing file and its related changes.

hassnaa-arm edited the summary of this revision. (Show Details)Nov 15 2022, 6:04 AM

Hi @hassnaa-arm, I've not been able to go through the entire patch yet, but I think it makes sense to split it up to make the changes a bit easier to review. I've left some comments to suggest how to split it up and also some comments on the code-changes itself.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1658

SETLT and SETLE should not be in this list, because they have a 1-1 mapping with instructions.
Most of the other nodes need expanding into two nodes (one for ordered/unordered and one for LE/LT), with SETO need expanding to not(unordered).

It would also be nice if these changes could be moved to a separate patch.

1668–1676

These actions are unrelated to VT as passed into the function, so they can be moved out of this function.

3785

It would be nice if you could split up your patch such that each such a code change, lives in its own patch with a set of corresponding tests. That makes the patch a bit more manageable to review.

12632

Please move this change and corresponding tests to a separate patch.

12894

This can actually be simplified to:

bool OverrideNEON = Subtarget->forceStreamingCompatibleSVE() ||
                    (Subtarget->useSVEForFixedLengthVectors() &&
                     (Op.getOpcode() == ISD::VECREDUCE_AND ||
                      Op.getOpcode() == ISD::VECREDUCE_OR ||
                      Op.getOpcode() == ISD::VECREDUCE_XOR ||
                      Op.getOpcode() == ISD::VECREDUCE_FADD ||
                      (Op.getOpcode() != ISD::VECREDUCE_ADD &&
                       SrcVT.getVectorElementType() == MVT::i64)));
if (SrcVT.isScalableVector() ||
    useSVEForFixedLengthVectorVT(SrcVT, OverrideNEON)) {

It would be nice if you could move this change, the changes in addTypeForStreamingSVE and corresponding reduction tests to a separate patch.

13911

Rather than adding this condition here, you can add the condition to isLegalInterleavedAccessType like this:

-  if (Subtarget->useSVEForFixedLengthVectors() &&
-      (VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 ||
-       (VecSize < Subtarget->getMinSVEVectorSizeInBits() &&
-        isPowerOf2_32(NumElements) && VecSize > 128))) {
+  if (Subtarget->forceStreamingCompatibleSVE() ||
+      (Subtarget->useSVEForFixedLengthVectors() &&
+       (VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 ||
+        (VecSize < Subtarget->getMinSVEVectorSizeInBits() &&
+         isPowerOf2_32(NumElements) && VecSize > 128)))) {

When you add vscale_range(1,16) to the attributes of the test file, you will get the code you expect.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
3656–3660

Can you move this change to a separate patch and test it with something very simple, such as:

define fp128 @test_streaming_compatible_register_mov(fp128 %q0, fp128 %q1) {
; CHECK-LABEL: test_streaming_compatible_register_mov:
; CHECK:       // %bb.0:
; CHECK-NEXT:    mov     z0.d, z1.d
; CHECK-NEXT:    ret
  ret fp128 %y
}
llvm/test/CodeGen/AArch64/-streaming-mode-fixed-length-fp-arith.ll
1 ↗(On Diff #475442)

The name of this file is incorrect, it should start with sve- instead of -

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ld2-alloca.ll
20 ↗(On Diff #475442)

This test seems broken, because it's not using the result %strided.vec from the shufflevector, which as we can see from the assembly causes the load + shuffle to be removed entirely.

I've fixed sve-fixed-ld2-alloca.ll in c2600244fc14, can you update this test accordingly?

hassnaa-arm marked 6 inline comments as done.

fix broken ld2-alloca.ll test

[NFC] rearrange lines

Update by main branch

Update by parent patch

Remove testing files that use masked gather/scatter

Update by latest changes in main branch

Remove redundant condition in AArch64TargetTransformInfo.h

sdesmalen requested changes to this revision.Dec 1 2022, 7:57 AM

It seems this patch can be abandoned now.

This revision now requires changes to proceed.Dec 1 2022, 7:57 AM
hassnaa-arm abandoned this revision.Feb 6 2023, 11:47 AM

This patch was split into smaller patches.