Add a compile-time flag for enabling streaming mode.
When streaming mode is enabled, lower basic loads and stores of fixed-width vectors;
to generate code that is compatible to streaming mode.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
440 | This is going to enable a lot more code paths than it currently tested. Can you explain the rational for the new flag? Does it relate to SME's streaming-compatible mode? or it is wanted for other reasons? |
Rename force-sve-128bit-vector to force-sve-when-streaming-compatible
Renamed that flag because it's related to streaming-mode,
because during streaming-mode we can't use NEON, so we foce using SVE.
Add RUN line with flag of --force-sve-when-streaming-compatible to sve-fixed-length-masked-loads.ll
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
440 |
Out of curiosity, I've run a quick check for a potentially related issue, https://github.com/llvm/llvm-project/issues/56412 I'm no longer encountering the ICE when compiling "sve-fixed-length-masked-gather.ll" with this option enabled, as in llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll. At the same time, there's no impact on the generated code (identical assembly); in any case, that's strictly better than ICE. @hassnaa-arm, I'm wondering, just to be on the safe side, could you possibly run a quick check on your end to make sure that you're not encountering any issues with "sve-fixed-length-masked-gather.ll", either? That, and perhaps even add a RUN line with llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 to that file (as it has caused an ICE with 128-bit SVE compilation before), if that would be no trouble? @paulwalker-arm, If this patch gets accepted and the above test works fine perhaps that would offer a way to close https://github.com/llvm/llvm-project/issues/56412? |
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
440 | Update: I've tested on the whole file and the ICE does appear, after all. After compiling with llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll: PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll 1. Running pass 'Function Pass Manager' on module 'sve-fixed-length-masked-gather.ll'. 2. Running pass 'AArch64 Instruction Selection' on function '@masked_gather_v8f16' #0 0x00007f1e7892db26 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:573:3 #1 0x00007f1e7892b9ad llvm::sys::RunSignalHandlers() /path/to/llvm-project/llvm/lib/Support/Signals.cpp:103:20 #2 0x00007f1e7892bb2c SignalHandler(int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1 #3 0x00007f1e77a42210 __restore_rt (/lib64/libc.so.6+0x3a210) #4 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33 #5 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14 #6 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36 #7 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32 #8 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33 #9 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14 #10 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36 #11 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32 #12 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33 #13 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14 #14 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36 . . . (the remaining part similarly recurring as in the aforementioned GitHub issue). |
@Matt: This work is orthogonal to https://github.com/llvm/llvm-project/issues/56412. When in streaming mode gather/scatter instructions are not available so we'll have code to mark the associated intrinsics as illegal and thus they'll be scalarised before reaching code gen. This doesn't take away the importance of the GitHub issue, which will be resolved when we specifically enable gather/scatter code generation for 128-bit vectors.
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
69 | Can this be "force-streaming-compatible-mode"? which I believe better reflects your intent. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1392 | I think at some point we probably want to combine this with the code below: if (Subtarget->useSVEForFixedLengthVectors()) { for (MVT VT : MVT::integer_fixedlen_vector_valuetypes()) if (useSVEForFixedLengthVectorVT(VT)) addTypeForFixedLengthSVE(VT); The problem is that addTypeForFixedLengthSVE will add a whole bunch of opcodes all at once, which we're probably not ready for. @hassnaa-arm perhaps you can simplify this to something like: if (Subtarget->forceSVEInStreamingMode()) { for (MVT VT : MVT::integer_fixedlen_vector_valuetypes()) if (useSVEForFixedLengthVectorVT(VT, true) addTypeForStreamingSVE(VT); for (MVT VT : MVT::fp_fixedlen_vector_valuetypes()) if (useSVEForFixedLengthVectorVT(VT, true) addTypeForStreamingSVE(VT); } where you add a function called addTypeForStreamingSVE a bit similar to addTypeForFixedLengthSVE. For now it would just be: void addTypeForStreamingSVE(EVT VT) { setOperationAction(ISD::ANY_EXTEND, VT, Custom); setOperationAction(ISD::ZERO_EXTEND, VT, Custom); setOperationAction(ISD::SIGN_EXTEND, VT, Custom); setOperationAction(ISD::LOAD, VT, Custom); setOperationAction(ISD::CONCAT_VECTORS, VT, Custom); } What do you think? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1392 | I also think it would be good to try and simplify this, and for now it might also be worth adding a TODO to explain that these functions will be combined once all of the opcodes have been covered? | |
llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll | ||
420 ↗ | (On Diff #464714) | nit: extra whitespace :) |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ext-loads.ll | ||
8 ↗ | (On Diff #464714) | The name of this test doesn't look quite right - I think for this one it should be something like @load_zext_v8i8i16, the next one should be @load_zext_v4i16i32, etc. I could be wrong, I am just comparing to some of the existing tests we have in sve-fixed-length-ext-loads.ll. |
318 ↗ | (On Diff #464714) | Is it worth adding tests where the type being extended from is also illegal? Something like this: %a = load <16 x i16>, <16 x i16>* %ap %val = sext <16 x i16> %a to <16 x i64> ret <16 x i64> %val |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-loads.ll | ||
18 | There don't seem to be any check lines for any of the VBITS_GE_* labels added here? Maybe if they are not needed you could remove the extra labels, or add some check lines to match the ones you need. I think fixing this will also remove the note added at the bottom of this test. | |
77 | Can you please add a test using a load of type which is illegal for Neon, e.g. 32 x float? | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-store.ll | ||
4 ↗ | (On Diff #464714) | As with the fixed-length-loads.ll test, I think removing the unused labels here (and in the other test files below) or adding check lines for them will remove the warnings added by the test script. |
104 ↗ | (On Diff #464714) | Can you please also add a test with an illegal Neon type? |
Hi @hassnaa-arm, I've not had a chance to fully review this yet, but could you rename the title to something like
[AArch64][SVE]: Lower all types of load/store of 128-bit fixed-width vector using SVE
? This patch is now lowering more than just masked loads/stores.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1393 | nit: unnecessary curly braces here and below. | |
12552 | Are you doing these as part of the same patch, because tests like sve-fixed-length-int-shifts.ll require loads and stores to work? I wonder if you can still split out the unpredicated, non-extending/truncating loads/stores from this patch, such that you can have:
| |
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
3036 ↗ | (On Diff #465685) | I think you can just write: let AddedComplexity = 1, Predicates = [IsForcingSVEDisabled] in { ... } which would avoid the extra indentation. |
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
69 | Can you rename this variable to ForceStreamingCompatibleSVE? (likewise change the name of the flag to -force-streaming-compatible-sve) The current name ForceSVEWhenStreamingCompatible suggests to use the full range of SVE instructions when in streaming-compatible mode, even the instructions that would be illegal in that mode, but that would be incorrect. |
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
69 | so there are some SVE instruction that are illegal in streaming mode ? like what ? |
Rename flag of force-streaming-compatible-mode to force-streaming-mode-compatible-sve
Thanks @hassnaa-arm! I just left a few more nits mostly around the names, but other than that it looks really good!
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
12187 | nit: comment can be removed? | |
12550 | nit: comment can be removed? | |
12562 | nit: comment can be removed? | |
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
69 | Streaming SVE is a subset of SVE, in that some SVE instructions (e.g. gather/scatter) are not valid in Streaming Mode. | |
69 | nit: I think that 'mode' is kind of implied from 'streaming compatible', so you remove -mode from the name, i.e. force-streaming-mode-compatible-sve -> force-streaming-compatible-sve. | |
441 | Should this return true always and instead have assert(hasSVE() && "Expected SVE to be available") ? | |
447 | nit: forceStreamingCompatibleSVE (see other comment above) |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1393 | Unlike useSVEForFixedLengthVectors() mode where the MVTs are not know ahead of time, the use case for forceStreamingModeCompatibleSVE() mode is specific to 128-bit and 64-bit vectors and so you shouldn't need to iterate across all vector types. | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll | ||
4–18 | You shouldn't need to test all these combinations. It should be sufficient to test without any -aarch64-sve-vector-bits-min= options as that's the expected use case. For the tests themselves you want to add some that use 256bit vectors to verify we don't emit neon instructions as part of type legalisation. |
set operation action as custom only for 128-bit and 64-bit vectors instead of all types
Thanks for the changes to this patch. This one looks good to me now!
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
12187 | nit: it seems you missed this one (can be removed) |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1393 | I don't see tests for this and some of the other MVTs. Do we need explicit handling for MVT::v1i64 and MVT::v1f64? I would have thought these would just emit a scalar access, although there's no tests to show this. | |
1395 | Is this check (plus the one for the float loop) necessary? I would expect that when forceStreamingCompatibleSVE() returns true we have no choice but to enable the custom lowering. | |
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
440–443 | if (forceStreamingCompatibleSVE()) return true; Doing this might mean useSVEForFixedLengthVectors can remain defined in the header file. | |
452 | Should this also include || hasSME()? | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll | ||
27 | The patch summary mentions lowering stores and you've added this test but I don't see any code to enable such lowering and hence we are seeing NEON str instructions. |
Remove custom-lowering ISD::load.
For fixed-length load/store, no need for custom-lowering ISD::load.
It was added I thought that ldr is illegal in streaming mode, but it is legal, so no need for custom-lowering ISD::load now.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1392–1400 | I understand why this code block and addTypeForStreamingSVE() exist, but given they don't do anything within this patch anymore I think they're best moved into one of your other patches. | |
5757–5759 | Is this still necessary now that you're no longer custom lowering ISD::LOAD for NEON sized vectors? | |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
7137–7142 | Up to you but personally I think NotInStreamingSVEMode reads better. |
Remove 'addTypeForStreamingSVE()' as it's not needed now.
Now, I don't use set Custom operation action for any node, so not need for addTypeForStreamingSVE().
As discussed offline there's more we can do here to improve code quality but this'll come as you increase ISD node coverage. I've one minor issue but otherwise this looks good to me.
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
441 | I guess this should match the code below, although I'm not quite sure why the assert it needed. |
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
441 | If someone forces using streaming-compatible code, SVE must be available. (and given that its not a user-exposed feature in Clang, it's fine for the compiler to crash if someone would use this feature while forgetting to set +sve somehow) |
I think at some point we probably want to combine this with the code below:
The problem is that addTypeForFixedLengthSVE will add a whole bunch of opcodes all at once, which we're probably not ready for.
@hassnaa-arm perhaps you can simplify this to something like:
where you add a function called addTypeForStreamingSVE a bit similar to addTypeForFixedLengthSVE. For now it would just be:
What do you think?