This is an archive of the discontinued LLVM Phabricator instance.

[AArch64]: Force generating code compatible to streaming mode
ClosedPublic

Authored by hassnaa-arm on Sep 7 2022, 9:40 AM.

Details

Summary

Add a compile-time flag for enabling streaming mode.
When streaming mode is enabled, lower basic loads and stores of fixed-width vectors;
to generate code that is compatible to streaming mode.

Diff Detail

Event Timeline

hassnaa-arm created this revision.Sep 7 2022, 9:40 AM
hassnaa-arm requested review of this revision.Sep 7 2022, 9:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 9:40 AM
Matt added a subscriber: Matt.Sep 8 2022, 12:11 AM
paulwalker-arm added inline comments.Sep 8 2022, 3:18 AM
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
440

This is going to enable a lot more code paths than it currently tested. Can you explain the rational for the new flag? Does it relate to SME's streaming-compatible mode? or it is wanted for other reasons?

Adding store test cases to sve-fixed-length-masked-stores.ll

Rename force-sve-128bit-vector to force-sve-when-streaming-compatible
Renamed that flag because it's related to streaming-mode,
because during streaming-mode we can't use NEON, so we foce using SVE.

Add RUN line with flag of --force-sve-when-streaming-compatible to sve-fixed-length-masked-loads.ll

Matt added inline comments.Sep 8 2022, 4:21 PM
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
440

This is going to enable a lot more code paths than it currently tested.

Out of curiosity, I've run a quick check for a potentially related issue, https://github.com/llvm/llvm-project/issues/56412

I'm no longer encountering the ICE when compiling "sve-fixed-length-masked-gather.ll" with this option enabled, as in llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll.

At the same time, there's no impact on the generated code (identical assembly); in any case, that's strictly better than ICE.

@hassnaa-arm, I'm wondering, just to be on the safe side, could you possibly run a quick check on your end to make sure that you're not encountering any issues with "sve-fixed-length-masked-gather.ll", either? That, and perhaps even add a RUN line with llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 to that file (as it has caused an ICE with 128-bit SVE compilation before), if that would be no trouble?

@paulwalker-arm, If this patch gets accepted and the above test works fine perhaps that would offer a way to close https://github.com/llvm/llvm-project/issues/56412?

Matt added inline comments.Sep 8 2022, 4:42 PM
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
440

Update: I've tested on the whole file and the ICE does appear, after all.
The difference is that now the affected function is masked_gather_v8f16 (whereas previously compiling masked_gather_v2f16 alone was sufficient to trigger the ICE--now it no longer does).

After compiling with llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll:

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll
1.      Running pass 'Function Pass Manager' on module 'sve-fixed-length-masked-gather.ll'.
2.      Running pass 'AArch64 Instruction Selection' on function '@masked_gather_v8f16'
  #0 0x00007f1e7892db26 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:573:3
  #1 0x00007f1e7892b9ad llvm::sys::RunSignalHandlers() /path/to/llvm-project/llvm/lib/Support/Signals.cpp:103:20
  #2 0x00007f1e7892bb2c SignalHandler(int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
  #3 0x00007f1e77a42210 __restore_rt (/lib64/libc.so.6+0x3a210)
  #4 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33
  #5 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14
  #6 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36
  #7 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32
  #8 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33
  #9 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14
 #10 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36
 #11 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32
 #12 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33
 #13 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14
 #14 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36
. . .

(the remaining part similarly recurring as in the aforementioned GitHub issue).

@Matt: This work is orthogonal to https://github.com/llvm/llvm-project/issues/56412. When in streaming mode gather/scatter instructions are not available so we'll have code to mark the associated intrinsics as illegal and thus they'll be scalarised before reaching code gen. This doesn't take away the importance of the GitHub issue, which will be resolved when we specifically enable gather/scatter code generation for 128-bit vectors.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69

Can this be "force-streaming-compatible-mode"? which I believe better reflects your intent.

Force using SVE in streaming mode.

Force SVE in Streaming Mode for all types of load/store

Rename new load/store files

david-arm added inline comments.Oct 4 2022, 7:41 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1392

I think at some point we probably want to combine this with the code below:

if (Subtarget->useSVEForFixedLengthVectors()) {
  for (MVT VT : MVT::integer_fixedlen_vector_valuetypes())
    if (useSVEForFixedLengthVectorVT(VT))
      addTypeForFixedLengthSVE(VT);

The problem is that addTypeForFixedLengthSVE will add a whole bunch of opcodes all at once, which we're probably not ready for.

@hassnaa-arm perhaps you can simplify this to something like:

if (Subtarget->forceSVEInStreamingMode()) {
  for (MVT VT : MVT::integer_fixedlen_vector_valuetypes())
    if (useSVEForFixedLengthVectorVT(VT, true)
      addTypeForStreamingSVE(VT);
  for (MVT VT : MVT::fp_fixedlen_vector_valuetypes())
     if (useSVEForFixedLengthVectorVT(VT, true)
       addTypeForStreamingSVE(VT);
}

where you add a function called addTypeForStreamingSVE a bit similar to addTypeForFixedLengthSVE. For now it would just be:

void addTypeForStreamingSVE(EVT VT) {
  setOperationAction(ISD::ANY_EXTEND, VT, Custom);
  setOperationAction(ISD::ZERO_EXTEND, VT, Custom);
  setOperationAction(ISD::SIGN_EXTEND, VT, Custom);
  setOperationAction(ISD::LOAD, VT, Custom);
  setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
}

What do you think?

kmclaughlin added inline comments.Oct 4 2022, 9:33 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1392

I also think it would be good to try and simplify this, and for now it might also be worth adding a TODO to explain that these functions will be combined once all of the opcodes have been covered?

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll
420 ↗(On Diff #464714)

nit: extra whitespace :)

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ext-loads.ll
8 ↗(On Diff #464714)

The name of this test doesn't look quite right - I think for this one it should be something like @load_zext_v8i8i16, the next one should be @load_zext_v4i16i32, etc. I could be wrong, I am just comparing to some of the existing tests we have in sve-fixed-length-ext-loads.ll.

318 ↗(On Diff #464714)

Is it worth adding tests where the type being extended from is also illegal? Something like this:

%a = load <16 x i16>, <16 x i16>* %ap
%val = sext <16 x i16> %a to <16 x i64>
ret <16 x i64> %val
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-loads.ll
18

There don't seem to be any check lines for any of the VBITS_GE_* labels added here? Maybe if they are not needed you could remove the extra labels, or add some check lines to match the ones you need. I think fixing this will also remove the note added at the bottom of this test.

77

Can you please add a test using a load of type which is illegal for Neon, e.g. 32 x float?

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-store.ll
4 ↗(On Diff #464714)

As with the fixed-length-loads.ll test, I think removing the unused labels here (and in the other test files below) or adding check lines for them will remove the warnings added by the test script.

104 ↗(On Diff #464714)

Can you please also add a test with an illegal Neon type?

Hi @hassnaa-arm, I've not had a chance to fully review this yet, but could you rename the title to something like

[AArch64][SVE]: Lower all types of load/store of 128-bit fixed-width vector using SVE

? This patch is now lowering more than just masked loads/stores.

hassnaa-arm marked 9 inline comments as done.

set custom action only in case of streaming mode
add some illegal NEON tests

hassnaa-arm retitled this revision from [AArch64-SVE]: lower masked load/store of 128-bit fixed-width vectors to [AArch64-SVE]: lower all types of loads and stores of fixed-width vector.Oct 6 2022, 4:02 AM
sdesmalen added inline comments.Oct 6 2022, 8:39 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1393

nit: unnecessary curly braces here and below.

12552

Are you doing these as part of the same patch, because tests like sve-fixed-length-int-shifts.ll require loads and stores to work?

I wonder if you can still split out the unpredicated, non-extending/truncating loads/stores from this patch, such that you can have:

  1. patch for basic loads/stores + anything required to make these work (including tests)
  2. patches for shifts, concat_vectors, build_vector, vector_shuffle, extract_vector_elt, extract_subvector (or at least as many that are not required for (1)), with corresponding tests (e.g. sve-fixed-length-int-shifts.ll -> sve-streaming-compatible-fixed-length-int-shifts.ll), because these tests are currently missing.
  3. patch for masked and extending/truncating loads and stores.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
3036 ↗(On Diff #465685)

I think you can just write:

let AddedComplexity = 1, Predicates = [IsForcingSVEDisabled] in {
  ...
}

which would avoid the extra indentation.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69

Can you rename this variable to ForceStreamingCompatibleSVE? (likewise change the name of the flag to -force-streaming-compatible-sve)

The current name ForceSVEWhenStreamingCompatible suggests to use the full range of SVE instructions when in streaming-compatible mode, even the instructions that would be illegal in that mode, but that would be incorrect.

hassnaa-arm marked 2 inline comments as done.Oct 7 2022, 3:07 AM
hassnaa-arm added inline comments.
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69

so there are some SVE instruction that are illegal in streaming mode ? like what ?
because I was checking only for NEON illegal instructions, not SVE illegal instruction.

hassnaa-arm updated this revision to Diff 466187.EditedOct 7 2022, 2:46 PM

Split out the patch. this patch now has only related work to basic load and store.

Remove commented lines

hassnaa-arm retitled this revision from [AArch64-SVE]: lower all types of loads and stores of fixed-width vector to [AArch64-SVE]: Force using streaming compatible mode.Oct 7 2022, 3:06 PM
hassnaa-arm edited the summary of this revision. (Show Details)
hassnaa-arm retitled this revision from [AArch64-SVE]: Force using streaming compatible mode to [AArch64]: Force generating code compatible to streaming mode.Oct 9 2022, 3:18 PM
hassnaa-arm edited the summary of this revision. (Show Details)

Rename flag of force-streaming-compatible-mode to force-streaming-mode-compatible-sve

Thanks @hassnaa-arm! I just left a few more nits mostly around the names, but other than that it looks really good!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12187

nit: comment can be removed?

12550

nit: comment can be removed?

12562

nit: comment can be removed?

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69

Streaming SVE is a subset of SVE, in that some SVE instructions (e.g. gather/scatter) are not valid in Streaming Mode.

69

nit: I think that 'mode' is kind of implied from 'streaming compatible', so you remove -mode from the name, i.e. force-streaming-mode-compatible-sve -> force-streaming-compatible-sve.
Same request for the name of the variable, i.e. ForceStreamingModeCompatibleSVE -> ForceStreamingCompatibleSVE

441

Should this return true always and instead have assert(hasSVE() && "Expected SVE to be available") ?
If someone forces using streaming-compatible code, SVE must be available. (and given that its not a user-exposed feature in Clang, it's fine for the compiler to crash if someone would use this feature while forgetting to set +sve somehow)

447

nit: forceStreamingCompatibleSVE (see other comment above)

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1393

Unlike useSVEForFixedLengthVectors() mode where the MVTs are not know ahead of time, the use case for forceStreamingModeCompatibleSVE() mode is specific to 128-bit and 64-bit vectors and so you shouldn't need to iterate across all vector types.

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll
4–18

You shouldn't need to test all these combinations. It should be sufficient to test without any -aarch64-sve-vector-bits-min= options as that's the expected use case. For the tests themselves you want to add some that use 256bit vectors to verify we don't emit neon instructions as part of type legalisation.

hassnaa-arm marked 8 inline comments as done.Oct 11 2022, 3:53 AM

set operation action as custom only for 128-bit and 64-bit vectors instead of all types

sdesmalen accepted this revision.Oct 11 2022, 4:01 AM

Thanks for the changes to this patch. This one looks good to me now!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12187

nit: it seems you missed this one (can be removed)

This revision is now accepted and ready to land.Oct 11 2022, 4:01 AM
paulwalker-arm added inline comments.Oct 11 2022, 7:39 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1393

I don't see tests for this and some of the other MVTs. Do we need explicit handling for MVT::v1i64 and MVT::v1f64? I would have thought these would just emit a scalar access, although there's no tests to show this.

1395

Is this check (plus the one for the float loop) necessary? I would expect that when forceStreamingCompatibleSVE() returns true we have no choice but to enable the custom lowering.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
440–443
if (forceStreamingCompatibleSVE())
  return true;

Doing this might mean useSVEForFixedLengthVectors can remain defined in the header file.

452

Should this also include || hasSME()?

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll
27

The patch summary mentions lowering stores and you've added this test but I don't see any code to enable such lowering and hence we are seeing NEON str instructions.

hassnaa-arm marked 5 inline comments as done.

Updated by parent patch

Add additional test cases

hassnaa-arm marked an inline comment as not done.Oct 13 2022, 5:36 AM

Remove custom-lowering ISD::load.
For fixed-length load/store, no need for custom-lowering ISD::load.
It was added I thought that ldr is illegal in streaming mode, but it is legal, so no need for custom-lowering ISD::load now.

paulwalker-arm added inline comments.Oct 13 2022, 4:21 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1392–1400

I understand why this code block and addTypeForStreamingSVE() exist, but given they don't do anything within this patch anymore I think they're best moved into one of your other patches.

5757–5759

Is this still necessary now that you're no longer custom lowering ISD::LOAD for NEON sized vectors?

llvm/lib/Target/AArch64/AArch64InstrInfo.td
7137–7142

Up to you but personally I think NotInStreamingSVEMode reads better.

hassnaa-arm marked an inline comment as done.

Remove 'addTypeForStreamingSVE()' as it's not needed now.
Now, I don't use set Custom operation action for any node, so not need for addTypeForStreamingSVE().

hassnaa-arm marked 2 inline comments as done.Oct 14 2022, 8:16 AM
paulwalker-arm accepted this revision.Oct 14 2022, 8:32 AM

As discussed offline there's more we can do here to improve code quality but this'll come as you increase ISD node coverage. I've one minor issue but otherwise this looks good to me.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
441

I guess this should match the code below, although I'm not quite sure why the assert it needed.

hassnaa-arm added inline comments.Oct 14 2022, 8:43 AM
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
441

If someone forces using streaming-compatible code, SVE must be available. (and given that its not a user-exposed feature in Clang, it's fine for the compiler to crash if someone would use this feature while forgetting to set +sve somehow)

This revision was landed with ongoing or failed builds.Oct 14 2022, 10:47 AM
This revision was automatically updated to reflect the committed changes.