This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
12/13
AArch64ISelLowering.cpp
1/1
AArch64InstrInfo.td
-
AArch64Subtarget.h
6/14
AArch64Subtarget.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2/2
sve-streaming-mode-fixed-length-loads.ll
1/2
sve-streaming-mode-fixed-length-stores.ll

Differential D133433

[AArch64]: Force generating code compatible to streaming mode
ClosedPublic

Authored by hassnaa-arm on Sep 7 2022, 9:40 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
sdesmalen
david-arm
kmclaughlin

Commits

rG2c72d90ecc69: [AArch64-SVE]: Force generating code compatible to streaming mode.

Summary

Add a compile-time flag for enabling streaming mode.
When streaming mode is enabled, lower basic loads and stores of fixed-width vectors;
to generate code that is compatible to streaming mode.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hassnaa-arm created this revision.Sep 7 2022, 9:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 9:40 AM

Herald added subscribers: ctetreau, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

hassnaa-arm requested review of this revision.Sep 7 2022, 9:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 9:40 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B185433: Diff 458482.Sep 7 2022, 10:29 AM

Matt added a subscriber: Matt.Sep 8 2022, 12:11 AM

hassnaa-arm added a reviewer: david-arm.Sep 8 2022, 3:15 AM

paulwalker-arm added inline comments.Sep 8 2022, 3:18 AM

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
440	This is going to enable a lot more code paths than it currently tested. Can you explain the rational for the new flag? Does it relate to SME's streaming-compatible mode? or it is wanted for other reasons?

Adding store test cases to sve-fixed-length-masked-stores.ll

Harbormaster completed remote builds in B185604: Diff 458713.Sep 8 2022, 7:17 AM

Rename force-sve-128bit-vector to force-sve-when-streaming-compatible
Renamed that flag because it's related to streaming-mode,
because during streaming-mode we can't use NEON, so we foce using SVE.

Add RUN line with flag of --force-sve-when-streaming-compatible to sve-fixed-length-masked-loads.ll

Harbormaster completed remote builds in B185621: Diff 458737.Sep 8 2022, 8:25 AM

Matt added inline comments.Sep 8 2022, 4:21 PM

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
440	This is going to enable a lot more code paths than it currently tested. Out of curiosity, I've run a quick check for a potentially related issue, https://github.com/llvm/llvm-project/issues/56412 I'm no longer encountering the ICE when compiling "sve-fixed-length-masked-gather.ll" with this option enabled, as in `llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll`. At the same time, there's no impact on the generated code (identical assembly); in any case, that's strictly better than ICE. @hassnaa-arm, I'm wondering, just to be on the safe side, could you possibly run a quick check on your end to make sure that you're not encountering any issues with "sve-fixed-length-masked-gather.ll", either? That, and perhaps even add a RUN line with `llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128` to that file (as it has caused an ICE with 128-bit SVE compilation before), if that would be no trouble? @paulwalker-arm, If this patch gets accepted and the above test works fine perhaps that would offer a way to close https://github.com/llvm/llvm-project/issues/56412?

Matt added inline comments.Sep 8 2022, 4:42 PM

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

440

Update: I've tested on the whole file and the ICE does appear, after all.
The difference is that now the affected function is masked_gather_v8f16 (whereas previously compiling masked_gather_v2f16 alone was sufficient to trigger the ICE--now it no longer does).

After compiling with llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll:

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll
1.      Running pass 'Function Pass Manager' on module 'sve-fixed-length-masked-gather.ll'.
2.      Running pass 'AArch64 Instruction Selection' on function '@masked_gather_v8f16'
  #0 0x00007f1e7892db26 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:573:3
  #1 0x00007f1e7892b9ad llvm::sys::RunSignalHandlers() /path/to/llvm-project/llvm/lib/Support/Signals.cpp:103:20
  #2 0x00007f1e7892bb2c SignalHandler(int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
  #3 0x00007f1e77a42210 __restore_rt (/lib64/libc.so.6+0x3a210)
  #4 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33
  #5 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14
  #6 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36
  #7 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32
  #8 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33
  #9 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14
 #10 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36
 #11 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32
 #12 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33
 #13 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14
 #14 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36
. . .

(the remaining part similarly recurring as in the aforementioned GitHub issue).

@Matt: This work is orthogonal to https://github.com/llvm/llvm-project/issues/56412. When in streaming mode gather/scatter instructions are not available so we'll have code to mark the associated intrinsics as illegal and thus they'll be scalarised before reaching code gen. This doesn't take away the importance of the GitHub issue, which will be resolved when we specifically enable gather/scatter code generation for 128-bit vectors.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69	Can this be "force-streaming-compatible-mode"? which I believe better reflects your intent.

@paulwalker-arm: I see, thanks!

Force using SVE in streaming mode.

hassnaa-arm added a reviewer: kmclaughlin.Sep 29 2022, 8:55 AM

Harbormaster completed remote builds in B189411: Diff 463893.Sep 29 2022, 9:22 AM

Force SVE in Streaming Mode for all types of load/store

Rename new load/store files

Harbormaster completed remote builds in B190002: Diff 464714.Oct 3 2022, 10:20 AM

david-arm added inline comments.Oct 4 2022, 7:41 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

1392

I think at some point we probably want to combine this with the code below:

if (Subtarget->useSVEForFixedLengthVectors()) {
  for (MVT VT : MVT::integer_fixedlen_vector_valuetypes())
    if (useSVEForFixedLengthVectorVT(VT))
      addTypeForFixedLengthSVE(VT);

The problem is that addTypeForFixedLengthSVE will add a whole bunch of opcodes all at once, which we're probably not ready for.

@hassnaa-arm perhaps you can simplify this to something like:

if (Subtarget->forceSVEInStreamingMode()) {
  for (MVT VT : MVT::integer_fixedlen_vector_valuetypes())
    if (useSVEForFixedLengthVectorVT(VT, true)
      addTypeForStreamingSVE(VT);
  for (MVT VT : MVT::fp_fixedlen_vector_valuetypes())
     if (useSVEForFixedLengthVectorVT(VT, true)
       addTypeForStreamingSVE(VT);
}

where you add a function called addTypeForStreamingSVE a bit similar to addTypeForFixedLengthSVE. For now it would just be:

void addTypeForStreamingSVE(EVT VT) {
  setOperationAction(ISD::ANY_EXTEND, VT, Custom);
  setOperationAction(ISD::ZERO_EXTEND, VT, Custom);
  setOperationAction(ISD::SIGN_EXTEND, VT, Custom);
  setOperationAction(ISD::LOAD, VT, Custom);
  setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
}

What do you think?

kmclaughlin added inline comments.Oct 4 2022, 9:33 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1392	I also think it would be good to try and simplify this, and for now it might also be worth adding a `TODO` to explain that these functions will be combined once all of the opcodes have been covered?
llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll
420 ↗	(On Diff #464714)	nit: extra whitespace :)
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ext-loads.ll
8 ↗	(On Diff #464714)	The name of this test doesn't look quite right - I think for this one it should be something like `@load_zext_v8i8i16`, the next one should be `@load_zext_v4i16i32`, etc. I could be wrong, I am just comparing to some of the existing tests we have in `sve-fixed-length-ext-loads.ll`.
318 ↗	(On Diff #464714)	Is it worth adding tests where the type being extended from is also illegal? Something like this: %a = load <16 x i16>, <16 x i16>* %ap %val = sext <16 x i16> %a to <16 x i64> ret <16 x i64> %val
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-loads.ll
18	There don't seem to be any check lines for any of the `VBITS_GE_*` labels added here? Maybe if they are not needed you could remove the extra labels, or add some check lines to match the ones you need. I think fixing this will also remove the note added at the bottom of this test.
77	Can you please add a test using a load of type which is illegal for Neon, e.g. `32 x float`?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-store.ll
4 ↗	(On Diff #464714)	As with the fixed-length-loads.ll test, I think removing the unused labels here (and in the other test files below) or adding check lines for them will remove the warnings added by the test script.
104 ↗	(On Diff #464714)	Can you please also add a test with an illegal Neon type?

Hi @hassnaa-arm, I've not had a chance to fully review this yet, but could you rename the title to something like

[AArch64][SVE]: Lower all types of load/store of 128-bit fixed-width vector using SVE

? This patch is now lowering more than just masked loads/stores.

david-arm mentioned this in D135324: [AArch64-SVE]: force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops..Oct 6 2022, 1:10 AM

set custom action only in case of streaming mode
add some illegal NEON tests

hassnaa-arm added a child revision: D135324: [AArch64-SVE]: force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops..Oct 6 2022, 3:57 AM

hassnaa-arm retitled this revision from [AArch64-SVE]: lower masked load/store of 128-bit fixed-width vectors to [AArch64-SVE]: lower all types of loads and stores of fixed-width vector.Oct 6 2022, 4:02 AM

Harbormaster completed remote builds in B190695: Diff 465685.Oct 6 2022, 4:43 AM

sdesmalen added inline comments.Oct 6 2022, 8:39 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1393	nit: unnecessary curly braces here and below.
12552	Are you doing these as part of the same patch, because tests like `sve-fixed-length-int-shifts.ll` require loads and stores to work? I wonder if you can still split out the unpredicated, non-extending/truncating loads/stores from this patch, such that you can have: patch for basic loads/stores + anything required to make these work (including tests) patches for shifts, concat_vectors, build_vector, vector_shuffle, extract_vector_elt, extract_subvector (or at least as many that are not required for (1)), with corresponding tests (e.g. `sve-fixed-length-int-shifts.ll` -> `sve-streaming-compatible-fixed-length-int-shifts.ll`), because these tests are currently missing. patch for masked and extending/truncating loads and stores.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
3036 ↗	(On Diff #465685)	I think you can just write: let AddedComplexity = 1, Predicates = [IsForcingSVEDisabled] in { ... } which would avoid the extra indentation.
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69	Can you rename this variable to `ForceStreamingCompatibleSVE`? (likewise change the name of the flag to `-force-streaming-compatible-sve`) The current name `ForceSVEWhenStreamingCompatible` suggests to use the full range of SVE instructions when in streaming-compatible mode, even the instructions that would be illegal in that mode, but that would be incorrect.

hassnaa-arm marked 2 inline comments as done.Oct 7 2022, 3:07 AM

hassnaa-arm added inline comments.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69	so there are some SVE instruction that are illegal in streaming mode ? like what ? because I was checking only for NEON illegal instructions, not SVE illegal instruction.

Split out the patch. this patch now has only related work to basic load and store.

Remove commented lines

hassnaa-arm retitled this revision from [AArch64-SVE]: lower all types of loads and stores of fixed-width vector to [AArch64-SVE]: Force using streaming compatible mode.Oct 7 2022, 3:06 PM

hassnaa-arm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B191031: Diff 466195.Oct 7 2022, 3:56 PM

hassnaa-arm retitled this revision from [AArch64-SVE]: Force using streaming compatible mode to [AArch64]: Force generating code compatible to streaming mode.Oct 9 2022, 3:18 PM

hassnaa-arm edited the summary of this revision. (Show Details)

Rename flag of force-streaming-compatible-mode to force-streaming-mode-compatible-sve

hassnaa-arm added a child revision: D135564: [AArch64-SVE]: Force generating code compatible to streaming mode..Oct 10 2022, 2:08 AM

Harbormaster completed remote builds in B191234: Diff 466452.Oct 10 2022, 2:36 AM

Thanks @hassnaa-arm! I just left a few more nits mostly around the names, but other than that it looks really good!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12187	nit: comment can be removed?
12550	nit: comment can be removed?
12562	nit: comment can be removed?
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69	Streaming SVE is a subset of SVE, in that some SVE instructions (e.g. gather/scatter) are not valid in Streaming Mode.
69	nit: I think that 'mode' is kind of implied from 'streaming compatible', so you remove `-mode` from the name, i.e. `force-streaming-mode-compatible-sve -> force-streaming-compatible-sve`. Same request for the name of the variable, i.e. `ForceStreamingModeCompatibleSVE -> ForceStreamingCompatibleSVE`
441	Should this return `true` always and instead have `assert(hasSVE() && "Expected SVE to be available")` ? If someone forces using streaming-compatible code, SVE must be available. (and given that its not a user-exposed feature in Clang, it's fine for the compiler to crash if someone would use this feature while forgetting to set `+sve` somehow)
447	nit: `forceStreamingCompatibleSVE` (see other comment above)

paulwalker-arm added inline comments.Oct 10 2022, 11:15 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1393	Unlike `useSVEForFixedLengthVectors()` mode where the MVTs are not know ahead of time, the use case for `forceStreamingModeCompatibleSVE()` mode is specific to 128-bit and 64-bit vectors and so you shouldn't need to iterate across all vector types.
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll
4–18	You shouldn't need to test all these combinations. It should be sufficient to test without any `-aarch64-sve-vector-bits-min=` options as that's the expected use case. For the tests themselves you want to add some that use 256bit vectors to verify we don't emit neon instructions as part of type legalisation.

hassnaa-arm marked 8 inline comments as done.Oct 11 2022, 3:53 AM

set operation action as custom only for 128-bit and 64-bit vectors instead of all types

Thanks for the changes to this patch. This one looks good to me now!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12187	nit: it seems you missed this one (can be removed)

This revision is now accepted and ready to land.Oct 11 2022, 4:01 AM

Harbormaster completed remote builds in B191469: Diff 466762.Oct 11 2022, 4:38 AM

paulwalker-arm added inline comments.Oct 11 2022, 7:39 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1393	I don't see tests for this and some of the other MVTs. Do we need explicit handling for `MVT::v1i64` and `MVT::v1f64`? I would have thought these would just emit a scalar access, although there's no tests to show this.
1395	Is this check (plus the one for the float loop) necessary? I would expect that when `forceStreamingCompatibleSVE()` returns true we have no choice but to enable the custom lowering.
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
440–443	if (forceStreamingCompatibleSVE()) return true; Doing this might mean `useSVEForFixedLengthVectors` can remain defined in the header file.
452	Should this also include `\|\| hasSME()`?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll
27	The patch summary mentions lowering stores and you've added this test but I don't see any code to enable such lowering and hence we are seeing NEON str instructions.

hassnaa-arm removed a child revision: D135324: [AArch64-SVE]: force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops..Oct 11 2022, 10:17 PM

Updated by parent patch

Add additional test cases

hassnaa-arm marked an inline comment as not done.Oct 13 2022, 5:36 AM

Harbormaster completed remote builds in B191951: Diff 467454.Oct 13 2022, 7:56 AM

Remove custom-lowering ISD::load.
For fixed-length load/store, no need for custom-lowering ISD::load.
It was added I thought that ldr is illegal in streaming mode, but it is legal, so no need for custom-lowering ISD::load now.

Harbormaster completed remote builds in B192003: Diff 467531.Oct 13 2022, 11:53 AM

paulwalker-arm added inline comments.Oct 13 2022, 4:21 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1392–1400	I understand why this code block and `addTypeForStreamingSVE()` exist, but given they don't do anything within this patch anymore I think they're best moved into one of your other patches.
5757–5759	Is this still necessary now that you're no longer custom lowering `ISD::LOAD` for NEON sized vectors?
llvm/lib/Target/AArch64/AArch64InstrInfo.td
7137–7142	Up to you but personally I think `NotInStreamingSVEMode` reads better.

Remove 'addTypeForStreamingSVE()' as it's not needed now.
Now, I don't use set Custom operation action for any node, so not need for addTypeForStreamingSVE().

Harbormaster completed remote builds in B192160: Diff 467745.Oct 14 2022, 5:41 AM

hassnaa-arm marked 2 inline comments as done.Oct 14 2022, 8:16 AM

As discussed offline there's more we can do here to improve code quality but this'll come as you increase ISD node coverage. I've one minor issue but otherwise this looks good to me.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
441	I guess this should match the code below, although I'm not quite sure why the assert it needed.

hassnaa-arm added inline comments.Oct 14 2022, 8:43 AM

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
441	If someone forces using streaming-compatible code, SVE must be available. (and given that its not a user-exposed feature in Clang, it's fine for the compiler to crash if someone would use this feature while forgetting to set +sve somehow)

This revision was landed with ongoing or failed builds.Oct 14 2022, 10:47 AM

Closed by commit rG2c72d90ecc69: [AArch64-SVE]: Force generating code compatible to streaming mode. (authored by Hassnaa Hamdi <hassnaa.hamdi@arm.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

hassnaa-arm added a commit: rG2c72d90ecc69: [AArch64-SVE]: Force generating code compatible to streaming mode..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

9 lines

AArch64InstrInfo.td

21 lines

AArch64Subtarget.h

5 lines

AArch64Subtarget.cpp

12 lines

test/

CodeGen/

AArch64/

sve-streaming-mode-fixed-length-loads.ll

230 lines

sve-streaming-mode-fixed-length-stores.ll

258 lines

Diff 467838

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,383 Lines • ▼ Show 20 Lines	if (Subtarget->hasSVE()) {
// NEON doesn't support 64-bit vector integer muls, but SVE does.		// NEON doesn't support 64-bit vector integer muls, but SVE does.
setOperationAction(ISD::MUL, MVT::v1i64, Custom);		setOperationAction(ISD::MUL, MVT::v1i64, Custom);
setOperationAction(ISD::MUL, MVT::v2i64, Custom);		setOperationAction(ISD::MUL, MVT::v2i64, Custom);

// NEON doesn't support across-vector reductions, but SVE does.		// NEON doesn't support across-vector reductions, but SVE does.
for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v2f64})		for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v2f64})
setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);

// NOTE: Currently this has to happen after computeRegisterProperties rather		// NOTE: Currently this has to happen after computeRegisterProperties rather
		david-armUnsubmitted Done Reply Inline Actions I think at some point we probably want to combine this with the code below: if (Subtarget->useSVEForFixedLengthVectors()) { for (MVT VT : MVT::integer_fixedlen_vector_valuetypes()) if (useSVEForFixedLengthVectorVT(VT)) addTypeForFixedLengthSVE(VT); The problem is that `addTypeForFixedLengthSVE` will add a whole bunch of opcodes all at once, which we're probably not ready for. @hassnaa-arm perhaps you can simplify this to something like: if (Subtarget->forceSVEInStreamingMode()) { for (MVT VT : MVT::integer_fixedlen_vector_valuetypes()) if (useSVEForFixedLengthVectorVT(VT, true) addTypeForStreamingSVE(VT); for (MVT VT : MVT::fp_fixedlen_vector_valuetypes()) if (useSVEForFixedLengthVectorVT(VT, true) addTypeForStreamingSVE(VT); } where you add a function called `addTypeForStreamingSVE` a bit similar to `addTypeForFixedLengthSVE`. For now it would just be: void addTypeForStreamingSVE(EVT VT) { setOperationAction(ISD::ANY_EXTEND, VT, Custom); setOperationAction(ISD::ZERO_EXTEND, VT, Custom); setOperationAction(ISD::SIGN_EXTEND, VT, Custom); setOperationAction(ISD::LOAD, VT, Custom); setOperationAction(ISD::CONCAT_VECTORS, VT, Custom); } What do you think? david-arm: I think at some point we probably want to combine this with the code below: if (Subtarget…
		kmclaughlinUnsubmitted Done Reply Inline Actions I also think it would be good to try and simplify this, and for now it might also be worth adding a `TODO` to explain that these functions will be combined once all of the opcodes have been covered? kmclaughlin: I also think it would be good to try and simplify this, and for now it might also be worth…
// than the preferred option of combining it with the addRegisterClass call.		// than the preferred option of combining it with the addRegisterClass call.
		sdesmalenUnsubmitted Done Reply Inline Actions nit: unnecessary curly braces here and below. sdesmalen: nit: unnecessary curly braces here and below.
		paulwalker-armUnsubmitted Done Reply Inline Actions Unlike `useSVEForFixedLengthVectors()` mode where the MVTs are not know ahead of time, the use case for `forceStreamingModeCompatibleSVE()` mode is specific to 128-bit and 64-bit vectors and so you shouldn't need to iterate across all vector types. paulwalker-arm: Unlike `useSVEForFixedLengthVectors()` mode where the MVTs are not know ahead of time, the use…
		paulwalker-armUnsubmitted Done Reply Inline Actions I don't see tests for this and some of the other MVTs. Do we need explicit handling for `MVT::v1i64` and `MVT::v1f64`? I would have thought these would just emit a scalar access, although there's no tests to show this. paulwalker-arm: I don't see tests for this and some of the other MVTs. Do we need explicit handling for `MVT…
if (Subtarget->useSVEForFixedLengthVectors()) {		if (Subtarget->useSVEForFixedLengthVectors()) {
for (MVT VT : MVT::integer_fixedlen_vector_valuetypes())		for (MVT VT : MVT::integer_fixedlen_vector_valuetypes())
		paulwalker-armUnsubmitted Done Reply Inline Actions Is this check (plus the one for the float loop) necessary? I would expect that when `forceStreamingCompatibleSVE()` returns true we have no choice but to enable the custom lowering. paulwalker-arm: Is this check (plus the one for the float loop) necessary? I would expect that when…
if (useSVEForFixedLengthVectorVT(VT))		if (useSVEForFixedLengthVectorVT(VT))
addTypeForFixedLengthSVE(VT);		addTypeForFixedLengthSVE(VT);
for (MVT VT : MVT::fp_fixedlen_vector_valuetypes())		for (MVT VT : MVT::fp_fixedlen_vector_valuetypes())
if (useSVEForFixedLengthVectorVT(VT))		if (useSVEForFixedLengthVectorVT(VT))
addTypeForFixedLengthSVE(VT);		addTypeForFixedLengthSVE(VT);
		paulwalker-armUnsubmitted Done Reply Inline Actions I understand why this code block and `addTypeForStreamingSVE()` exist, but given they don't do anything within this patch anymore I think they're best moved into one of your other patches. paulwalker-arm: I understand why this code block and `addTypeForStreamingSVE()` exist, but given they don't do…

// 64bit results can mean a bigger than NEON input.		// 64bit results can mean a bigger than NEON input.
for (auto VT : {MVT::v8i8, MVT::v4i16})		for (auto VT : {MVT::v8i8, MVT::v4i16})
setOperationAction(ISD::TRUNCATE, VT, Custom);		setOperationAction(ISD::TRUNCATE, VT, Custom);
setOperationAction(ISD::FP_ROUND, MVT::v4f16, Custom);		setOperationAction(ISD::FP_ROUND, MVT::v4f16, Custom);

// 128bit results imply a bigger than NEON input.		// 128bit results imply a bigger than NEON input.
for (auto VT : {MVT::v16i8, MVT::v8i16, MVT::v4i32})		for (auto VT : {MVT::v16i8, MVT::v8i16, MVT::v4i32})
▲ Show 20 Lines • Show All 4,340 Lines • ▼ Show 20 Lines	case ISD::SIGN_EXTEND_INREG: {
return LowerToPredicatedOp(Op, DAG,		return LowerToPredicatedOp(Op, DAG,
AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU);		AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU);
}		}
case ISD::TRUNCATE:		case ISD::TRUNCATE:
return LowerTRUNCATE(Op, DAG);		return LowerTRUNCATE(Op, DAG);
case ISD::MLOAD:		case ISD::MLOAD:
return LowerMLOAD(Op, DAG);		return LowerMLOAD(Op, DAG);
case ISD::LOAD:		case ISD::LOAD:
if (useSVEForFixedLengthVectorVT(Op.getValueType()))		if (useSVEForFixedLengthVectorVT(Op.getValueType(),
		Subtarget->forceStreamingCompatibleSVE()))
return LowerFixedLengthVectorLoadToSVE(Op, DAG);		return LowerFixedLengthVectorLoadToSVE(Op, DAG);
		paulwalker-armUnsubmitted Done Reply Inline Actions Is this still necessary now that you're no longer custom lowering `ISD::LOAD` for NEON sized vectors? paulwalker-arm: Is this still necessary now that you're no longer custom lowering `ISD::LOAD` for NEON sized…
return LowerLOAD(Op, DAG);		return LowerLOAD(Op, DAG);
case ISD::ADD:		case ISD::ADD:
case ISD::AND:		case ISD::AND:
case ISD::SUB:		case ISD::SUB:
return LowerToScalableOp(Op, DAG);		return LowerToScalableOp(Op, DAG);
case ISD::FMAXIMUM:		case ISD::FMAXIMUM:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::FMAX_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::FMAX_PRED);
case ISD::FMAXNUM:		case ISD::FMAXNUM:
▲ Show 20 Lines • Show All 5,283 Lines • ▼ Show 20 Lines

SDValue AArch64TargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,		SDValue AArch64TargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc dl(Op);		SDLoc dl(Op);
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

ShuffleVectorSDNode *SVN = cast<ShuffleVectorSDNode>(Op.getNode());		ShuffleVectorSDNode *SVN = cast<ShuffleVectorSDNode>(Op.getNode());

if (useSVEForFixedLengthVectorVT(VT))		if (useSVEForFixedLengthVectorVT(VT,
		Subtarget->forceStreamingCompatibleSVE()))
return LowerFixedLengthVECTOR_SHUFFLEToSVE(Op, DAG);		return LowerFixedLengthVECTOR_SHUFFLEToSVE(Op, DAG);

// Convert shuffles that are directly supported on NEON to target-specific		// Convert shuffles that are directly supported on NEON to target-specific
// DAG nodes, instead of keeping them as shuffles and matching them again		// DAG nodes, instead of keeping them as shuffles and matching them again
// during code selection. This is more efficient and avoids the possibility		// during code selection. This is more efficient and avoids the possibility
// of inconsistencies between legalization and selection.		// of inconsistencies between legalization and selection.
ArrayRef<int> ShuffleMask = SVN->getMask();		ArrayRef<int> ShuffleMask = SVN->getMask();

▲ Show 20 Lines • Show All 673 Lines • ▼ Show 20 Lines	static SDValue ConstantBuildVector(SDValue Op, SelectionDAG &DAG) {

return SDValue();		return SDValue();
}		}

SDValue AArch64TargetLowering::LowerBUILD_VECTOR(SDValue Op,		SDValue AArch64TargetLowering::LowerBUILD_VECTOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

if (useSVEForFixedLengthVectorVT(VT)) {		if (useSVEForFixedLengthVectorVT(VT,
		Subtarget->forceStreamingCompatibleSVE())) {
if (auto SeqInfo = cast<BuildVectorSDNode>(Op)->isConstantSequence()) {		if (auto SeqInfo = cast<BuildVectorSDNode>(Op)->isConstantSequence()) {
SDLoc DL(Op);		SDLoc DL(Op);
EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);		EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);
SDValue Start = DAG.getConstant(SeqInfo->first, DL, ContainerVT);		SDValue Start = DAG.getConstant(SeqInfo->first, DL, ContainerVT);
SDValue Steps = DAG.getStepVector(DL, ContainerVT, SeqInfo->second);		SDValue Steps = DAG.getStepVector(DL, ContainerVT, SeqInfo->second);
SDValue Seq = DAG.getNode(ISD::ADD, DL, ContainerVT, Start, Steps);		SDValue Seq = DAG.getNode(ISD::ADD, DL, ContainerVT, Start, Steps);
return convertFromScalableVector(DAG, Op.getValueType(), Seq);		return convertFromScalableVector(DAG, Op.getValueType(), Seq);
}		}
▲ Show 20 Lines • Show All 419 Lines • ▼ Show 20 Lines	if (VT.getScalarType() == MVT::i1) {
SDValue Extend =		SDValue Extend =
DAG.getNode(ISD::ANY_EXTEND, DL, VectorVT, Op.getOperand(0));		DAG.getNode(ISD::ANY_EXTEND, DL, VectorVT, Op.getOperand(0));
MVT ExtractTy = VectorVT == MVT::nxv2i64 ? MVT::i64 : MVT::i32;		MVT ExtractTy = VectorVT == MVT::nxv2i64 ? MVT::i64 : MVT::i32;
SDValue Extract = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ExtractTy,		SDValue Extract = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ExtractTy,
Extend, Op.getOperand(1));		Extend, Op.getOperand(1));
return DAG.getAnyExtOrTrunc(Extract, DL, Op.getValueType());		return DAG.getAnyExtOrTrunc(Extract, DL, Op.getValueType());
}		}

if (useSVEForFixedLengthVectorVT(VT))		if (useSVEForFixedLengthVectorVT(VT))
		sdesmalenUnsubmitted Done Reply Inline Actions nit: comment can be removed? sdesmalen: nit: comment can be removed?
		sdesmalenUnsubmitted Done Reply Inline Actions nit: it seems you missed this one (can be removed) sdesmalen: nit: it seems you missed this one (can be removed)
return LowerFixedLengthExtractVectorElt(Op, DAG);		return LowerFixedLengthExtractVectorElt(Op, DAG);

// Check for non-constant or out of range lane.		// Check for non-constant or out of range lane.
ConstantSDNode *CI = dyn_cast<ConstantSDNode>(Op.getOperand(1));		ConstantSDNode *CI = dyn_cast<ConstantSDNode>(Op.getOperand(1));
if (!CI \|\| CI->getZExtValue() >= VT.getVectorNumElements())		if (!CI \|\| CI->getZExtValue() >= VT.getVectorNumElements())
return SDValue();		return SDValue();

// Insertion/extraction are legal for V128 types.		// Insertion/extraction are legal for V128 types.
▲ Show 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerVectorSRA_SRL_SHL(SDValue Op,
int64_t Cnt;		int64_t Cnt;

if (!Op.getOperand(1).getValueType().isVector())		if (!Op.getOperand(1).getValueType().isVector())
return Op;		return Op;
unsigned EltSize = VT.getScalarSizeInBits();		unsigned EltSize = VT.getScalarSizeInBits();

switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
case ISD::SHL:		case ISD::SHL:
if (VT.isScalableVector() \|\| useSVEForFixedLengthVectorVT(VT))		if (VT.isScalableVector() \|\| useSVEForFixedLengthVectorVT(VT))
		sdesmalenUnsubmitted Done Reply Inline Actions nit: comment can be removed? sdesmalen: nit: comment can be removed?
return LowerToPredicatedOp(Op, DAG, AArch64ISD::SHL_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::SHL_PRED);

		sdesmalenUnsubmitted Not Done Reply Inline Actions Are you doing these as part of the same patch, because tests like `sve-fixed-length-int-shifts.ll` require loads and stores to work? I wonder if you can still split out the unpredicated, non-extending/truncating loads/stores from this patch, such that you can have: patch for basic loads/stores + anything required to make these work (including tests) patches for shifts, concat_vectors, build_vector, vector_shuffle, extract_vector_elt, extract_subvector (or at least as many that are not required for (1)), with corresponding tests (e.g. `sve-fixed-length-int-shifts.ll` -> `sve-streaming-compatible-fixed-length-int-shifts.ll`), because these tests are currently missing. patch for masked and extending/truncating loads and stores. sdesmalen: Are you doing these as part of the same patch, because tests like `sve-fixed-length-int-shifts.
if (isVShiftLImm(Op.getOperand(1), VT, false, Cnt) && Cnt < EltSize)		if (isVShiftLImm(Op.getOperand(1), VT, false, Cnt) && Cnt < EltSize)
return DAG.getNode(AArch64ISD::VSHL, DL, VT, Op.getOperand(0),		return DAG.getNode(AArch64ISD::VSHL, DL, VT, Op.getOperand(0),
DAG.getConstant(Cnt, DL, MVT::i32));		DAG.getConstant(Cnt, DL, MVT::i32));
return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, VT,		return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, VT,
DAG.getConstant(Intrinsic::aarch64_neon_ushl, DL,		DAG.getConstant(Intrinsic::aarch64_neon_ushl, DL,
MVT::i32),		MVT::i32),
Op.getOperand(0), Op.getOperand(1));		Op.getOperand(0), Op.getOperand(1));
case ISD::SRA:		case ISD::SRA:
case ISD::SRL:		case ISD::SRL:
if (VT.isScalableVector() \|\| useSVEForFixedLengthVectorVT(VT)) {		if (VT.isScalableVector() \|\| useSVEForFixedLengthVectorVT(VT)) {
		sdesmalenUnsubmitted Done Reply Inline Actions nit: comment can be removed? sdesmalen: nit: comment can be removed?
unsigned Opc = Op.getOpcode() == ISD::SRA ? AArch64ISD::SRA_PRED		unsigned Opc = Op.getOpcode() == ISD::SRA ? AArch64ISD::SRA_PRED
: AArch64ISD::SRL_PRED;		: AArch64ISD::SRL_PRED;
return LowerToPredicatedOp(Op, DAG, Opc);		return LowerToPredicatedOp(Op, DAG, Opc);
}		}

// Right shift immediate		// Right shift immediate
if (isVShiftRImm(Op.getOperand(1), VT, false, Cnt) && Cnt < EltSize) {		if (isVShiftRImm(Op.getOperand(1), VT, false, Cnt) && Cnt < EltSize) {
unsigned Opc =		unsigned Opc =
▲ Show 20 Lines • Show All 10,362 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	def UseAlternateSExtLoadCVTF32
: Predicate<"Subtarget->useAlternateSExtLoadCVTF32Pattern()">;		: Predicate<"Subtarget->useAlternateSExtLoadCVTF32Pattern()">;

def UseNegativeImmediates		def UseNegativeImmediates
: Predicate<"false">, AssemblerPredicate<(all_of (not FeatureNoNegativeImmediates)),		: Predicate<"false">, AssemblerPredicate<(all_of (not FeatureNoNegativeImmediates)),
"NegativeImmediates">;		"NegativeImmediates">;

def UseScalarIncVL : Predicate<"Subtarget->useScalarIncVL()">;		def UseScalarIncVL : Predicate<"Subtarget->useScalarIncVL()">;

		def NotInStreamingSVEMode : Predicate<"!Subtarget->forceStreamingCompatibleSVE()">;

def AArch64LocalRecover : SDNode<"ISD::LOCAL_RECOVER",		def AArch64LocalRecover : SDNode<"ISD::LOCAL_RECOVER",
SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>,		SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>,
SDTCisInt<1>]>>;		SDTCisInt<1>]>>;


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AArch64-specific DAG Nodes.		// AArch64-specific DAG Nodes.
//		//
▲ Show 20 Lines • Show All 6,896 Lines • ▼ Show 20 Lines
}]>;		}]>;

def : Ld1Lane128IdxOpPat<extloadi16, VectorIndexS, v4i32, i32, LD1i16, VectorIndexStoH>;		def : Ld1Lane128IdxOpPat<extloadi16, VectorIndexS, v4i32, i32, LD1i16, VectorIndexStoH>;
def : Ld1Lane128IdxOpPat<extloadi8, VectorIndexS, v4i32, i32, LD1i8, VectorIndexStoB>;		def : Ld1Lane128IdxOpPat<extloadi8, VectorIndexS, v4i32, i32, LD1i8, VectorIndexStoB>;
def : Ld1Lane128IdxOpPat<extloadi8, VectorIndexH, v8i16, i32, LD1i8, VectorIndexHtoB>;		def : Ld1Lane128IdxOpPat<extloadi8, VectorIndexH, v8i16, i32, LD1i8, VectorIndexHtoB>;

// Same as above, but the first element is populated using		// Same as above, but the first element is populated using
// scalar_to_vector + insert_subvector instead of insert_vector_elt.		// scalar_to_vector + insert_subvector instead of insert_vector_elt.
		let Predicates = [NotInStreamingSVEMode] in {
class Ld1Lane128FirstElm<ValueType ResultTy, ValueType VecTy,		class Ld1Lane128FirstElm<ValueType ResultTy, ValueType VecTy,
SDPatternOperator ExtLoad, Instruction LD1>		SDPatternOperator ExtLoad, Instruction LD1>
: Pat<(ResultTy (scalar_to_vector (i32 (ExtLoad GPR64sp:$Rn)))),		: Pat<(ResultTy (scalar_to_vector (i32 (ExtLoad GPR64sp:$Rn)))),
(ResultTy (EXTRACT_SUBREG		(ResultTy (EXTRACT_SUBREG
(LD1 (VecTy (IMPLICIT_DEF)), 0, GPR64sp:$Rn), dsub))>;		(LD1 (VecTy (IMPLICIT_DEF)), 0, GPR64sp:$Rn), dsub))>;
		paulwalker-armUnsubmitted Done Reply Inline Actions Up to you but personally I think `NotInStreamingSVEMode` reads better. paulwalker-arm: Up to you but personally I think `NotInStreamingSVEMode` reads better.

def : Ld1Lane128FirstElm<v2i32, v8i16, extloadi16, LD1i16>;		def : Ld1Lane128FirstElm<v2i32, v8i16, extloadi16, LD1i16>;
def : Ld1Lane128FirstElm<v2i32, v16i8, extloadi8, LD1i8>;		def : Ld1Lane128FirstElm<v2i32, v16i8, extloadi8, LD1i8>;
def : Ld1Lane128FirstElm<v4i16, v16i8, extloadi8, LD1i8>;		def : Ld1Lane128FirstElm<v4i16, v16i8, extloadi8, LD1i8>;
		}
class Ld1Lane64Pat<SDPatternOperator scalar_load, Operand VecIndex,		class Ld1Lane64Pat<SDPatternOperator scalar_load, Operand VecIndex,
ValueType VTy, ValueType STy, Instruction LD1>		ValueType VTy, ValueType STy, Instruction LD1>
: Pat<(vector_insert (VTy VecListOne64:$Rd),		: Pat<(vector_insert (VTy VecListOne64:$Rd),
(STy (scalar_load GPR64sp:$Rn)), VecIndex:$idx),		(STy (scalar_load GPR64sp:$Rn)), VecIndex:$idx),
(EXTRACT_SUBREG		(EXTRACT_SUBREG
(LD1 (SUBREG_TO_REG (i32 0), VecListOne64:$Rd, dsub),		(LD1 (SUBREG_TO_REG (i32 0), VecListOne64:$Rd, dsub),
VecIndex:$idx, GPR64sp:$Rn),		VecIndex:$idx, GPR64sp:$Rn),
dsub)>;		dsub)>;
▲ Show 20 Lines • Show All 1,289 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 362 Lines • ▼ Show 20 Lines	#include "AArch64GenSubtargetInfo.inc"
}		}

unsigned getMinSVEVectorSizeInBits() const {		unsigned getMinSVEVectorSizeInBits() const {
assert(HasSVE && "Tried to get SVE vector length without SVE support!");		assert(HasSVE && "Tried to get SVE vector length without SVE support!");
return MinSVEVectorSizeInBits;		return MinSVEVectorSizeInBits;
}		}

bool useSVEForFixedLengthVectors() const {		bool useSVEForFixedLengthVectors() const {
		if (forceStreamingCompatibleSVE())
		return true;

// Prefer NEON unless larger SVE registers are available.		// Prefer NEON unless larger SVE registers are available.
return hasSVE() && getMinSVEVectorSizeInBits() >= 256;		return hasSVE() && getMinSVEVectorSizeInBits() >= 256;
}		}

		bool forceStreamingCompatibleSVE() const;

unsigned getVScaleForTuning() const { return VScaleForTuning; }		unsigned getVScaleForTuning() const { return VScaleForTuning; }

const char* getChkStkName() const {		const char* getChkStkName() const {
if (isWindowsArm64EC())		if (isWindowsArm64EC())
return "__chkstk_arm64ec";		return "__chkstk_arm64ec";
return "__chkstk";		return "__chkstk";
}		}

Show All 10 Lines

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
// allocator, but can still be used as ABI requests, such as passing arguments		// allocator, but can still be used as ABI requests, such as passing arguments
// to function call.		// to function call.
static cl::list<std::string>		static cl::list<std::string>
ReservedRegsForRA("reserve-regs-for-regalloc", cl::desc("Reserve physical "		ReservedRegsForRA("reserve-regs-for-regalloc", cl::desc("Reserve physical "
"registers, so they can't be used by register allocator. "		"registers, so they can't be used by register allocator. "
"Should only be used for testing register allocator."),		"Should only be used for testing register allocator."),
cl::CommaSeparated, cl::Hidden);		cl::CommaSeparated, cl::Hidden);

		static cl::opt<bool>
		ForceStreamingCompatibleSVE("force-streaming-compatible-sve",
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Can this be "force-streaming-compatible-mode"? which I believe better reflects your intent. paulwalker-arm: Can this be "force-streaming-compatible-mode"? which I believe better reflects your intent.
		sdesmalenUnsubmitted Not Done Reply Inline Actions Can you rename this variable to `ForceStreamingCompatibleSVE`? (likewise change the name of the flag to `-force-streaming-compatible-sve`) The current name `ForceSVEWhenStreamingCompatible` suggests to use the full range of SVE instructions when in streaming-compatible mode, even the instructions that would be illegal in that mode, but that would be incorrect. sdesmalen: Can you rename this variable to `ForceStreamingCompatibleSVE`? (likewise change the name of the…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions so there are some SVE instruction that are illegal in streaming mode ? like what ? because I was checking only for NEON illegal instructions, not SVE illegal instruction. hassnaa-arm: so there are some SVE instruction that are illegal in streaming mode ? like what ? because I…
		sdesmalenUnsubmitted Not Done Reply Inline Actions Streaming SVE is a subset of SVE, in that some SVE instructions (e.g. gather/scatter) are not valid in Streaming Mode. sdesmalen: Streaming SVE is a subset of SVE, in that some SVE instructions (e.g. gather/scatter) are not…
		sdesmalenUnsubmitted Done Reply Inline Actions nit: I think that 'mode' is kind of implied from 'streaming compatible', so you remove `-mode` from the name, i.e. `force-streaming-mode-compatible-sve -> force-streaming-compatible-sve`. Same request for the name of the variable, i.e. `ForceStreamingModeCompatibleSVE -> ForceStreamingCompatibleSVE` sdesmalen: nit: I think that 'mode' is kind of implied from 'streaming compatible', so you remove `-mode`…
		cl::init(false), cl::Hidden);

unsigned AArch64Subtarget::getVectorInsertExtractBaseCost() const {		unsigned AArch64Subtarget::getVectorInsertExtractBaseCost() const {
if (OverrideVectorInsertExtractBaseCost.getNumOccurrences() > 0)		if (OverrideVectorInsertExtractBaseCost.getNumOccurrences() > 0)
return OverrideVectorInsertExtractBaseCost;		return OverrideVectorInsertExtractBaseCost;
return VectorInsertExtractBaseCost;		return VectorInsertExtractBaseCost;
}		}

AArch64Subtarget &AArch64Subtarget::initializeSubtargetDependencies(		AArch64Subtarget &AArch64Subtarget::initializeSubtargetDependencies(
StringRef FS, StringRef CPUString, StringRef TuneCPUString) {		StringRef FS, StringRef CPUString, StringRef TuneCPUString) {
▲ Show 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	void AArch64Subtarget::mirFileLoaded(MachineFunction &MF) const {
// bogus values after PEI has eliminated the callframe setup/destroy pseudo		// bogus values after PEI has eliminated the callframe setup/destroy pseudo
// instructions, specify explicitly if you need it to be correct.		// instructions, specify explicitly if you need it to be correct.
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
if (!MFI.isMaxCallFrameSizeComputed())		if (!MFI.isMaxCallFrameSizeComputed())
MFI.computeMaxCallFrameSize(MF);		MFI.computeMaxCallFrameSize(MF);
}		}

bool AArch64Subtarget::useAA() const { return UseAA; }		bool AArch64Subtarget::useAA() const { return UseAA; }

		bool AArch64Subtarget::forceStreamingCompatibleSVE() const {
		if (ForceStreamingCompatibleSVE) {
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This is going to enable a lot more code paths than it currently tested. Can you explain the rational for the new flag? Does it relate to SME's streaming-compatible mode? or it is wanted for other reasons? paulwalker-arm: This is going to enable a lot more code paths than it currently tested. Can you explain the…
		MattUnsubmitted Not Done Reply Inline Actions This is going to enable a lot more code paths than it currently tested. Out of curiosity, I've run a quick check for a potentially related issue, https://github.com/llvm/llvm-project/issues/56412 I'm no longer encountering the ICE when compiling "sve-fixed-length-masked-gather.ll" with this option enabled, as in `llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll`. At the same time, there's no impact on the generated code (identical assembly); in any case, that's strictly better than ICE. @hassnaa-arm, I'm wondering, just to be on the safe side, could you possibly run a quick check on your end to make sure that you're not encountering any issues with "sve-fixed-length-masked-gather.ll", either? That, and perhaps even add a RUN line with `llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128` to that file (as it has caused an ICE with 128-bit SVE compilation before), if that would be no trouble? @paulwalker-arm, If this patch gets accepted and the above test works fine perhaps that would offer a way to close https://github.com/llvm/llvm-project/issues/56412? Matt: > This is going to enable a lot more code paths than it currently tested. Out of curiosity…
		MattUnsubmitted Not Done Reply Inline Actions Update: I've tested on the whole file and the ICE does appear, after all. The difference is that now the affected function is `masked_gather_v8f16` (whereas previously compiling `masked_gather_v2f16` alone was sufficient to trigger the ICE--now it no longer does). After compiling with `llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll`: PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll 1. Running pass 'Function Pass Manager' on module 'sve-fixed-length-masked-gather.ll'. 2. Running pass 'AArch64 Instruction Selection' on function '@masked_gather_v8f16' #0 0x00007f1e7892db26 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:573:3 #1 0x00007f1e7892b9ad llvm::sys::RunSignalHandlers() /path/to/llvm-project/llvm/lib/Support/Signals.cpp:103:20 #2 0x00007f1e7892bb2c SignalHandler(int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1 #3 0x00007f1e77a42210 __restore_rt (/lib64/libc.so.6+0x3a210) #4 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33 #5 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14 #6 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36 #7 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32 #8 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33 #9 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14 #10 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36 #11 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32 #12 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33 #13 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14 #14 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36 . . . (the remaining part similarly recurring as in the aforementioned GitHub issue). Matt:* Update: I've tested on the whole file and the ICE does appear, after all. The difference is…
		assert((hasSVE() \|\| hasSME()) && "Expected SVE to be available");
		sdesmalenUnsubmitted Done Reply Inline Actions Should this return `true` always and instead have `assert(hasSVE() && "Expected SVE to be available")` ? If someone forces using streaming-compatible code, SVE must be available. (and given that its not a user-exposed feature in Clang, it's fine for the compiler to crash if someone would use this feature while forgetting to set `+sve` somehow) sdesmalen: Should this return `true` always and instead have `assert(hasSVE() && "Expected SVE to be…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I guess this should match the code below, although I'm not quite sure why the assert it needed. paulwalker-arm: I guess this should match the code below, although I'm not quite sure why the assert it needed.
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions If someone forces using streaming-compatible code, SVE must be available. (and given that its not a user-exposed feature in Clang, it's fine for the compiler to crash if someone would use this feature while forgetting to set +sve somehow) hassnaa-arm: If someone forces using streaming-compatible code, SVE must be available. (and given that its…
		return hasSVE() \|\| hasSME();
		}
		paulwalker-armUnsubmitted Done Reply Inline Actions if (forceStreamingCompatibleSVE()) return true; Doing this might mean `useSVEForFixedLengthVectors` can remain defined in the header file. paulwalker-arm: ``` if (forceStreamingCompatibleSVE()) return true; ``` Doing this might mean…
		return false;
		}
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `forceStreamingCompatibleSVE` (see other comment above) sdesmalen: nit: `forceStreamingCompatibleSVE` (see other comment above)
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Should this also include `\|\| hasSME()`? paulwalker-arm: Should this also include `\|\| hasSME()`?

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-loads.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -force-streaming-compatible-sve < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				define <4 x i8> @load_v4i8(<4 x i8>* %a) #0 {
				; CHECK-LABEL: load_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0]
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%load = load <4 x i8>, <4 x i8>* %a
				ret <4 x i8> %load
				}

				define <8 x i8> @load_v8i8(<8 x i8>* %a) #0 {
				; CHECK-LABEL: load_v8i8:
				kmclaughlinUnsubmitted Done Reply Inline Actions There don't seem to be any check lines for any of the `VBITS_GE_` labels added here? Maybe if they are not needed you could remove the extra labels, or add some check lines to match the ones you need. I think fixing this will also remove the note added at the bottom of this test. kmclaughlin:* There don't seem to be any check lines for any of the `VBITS_GE_*` labels added here? Maybe if…
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: ret
				%load = load <8 x i8>, <8 x i8>* %a
				ret <8 x i8> %load
				}

				define <16 x i8> @load_v16i8(<16 x i8>* %a) #0 {
				; CHECK-LABEL: load_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ret
				%load = load <16 x i8>, <16 x i8>* %a
				ret <16 x i8> %load
				}

				define <32 x i8> @load_v32i8(<32 x i8>* %a) #0 {
				; CHECK-LABEL: load_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ret
				%load = load <32 x i8>, <32 x i8>* %a
				ret <32 x i8> %load
				}

				define <2 x i16> @load_v2i16(<2 x i16>* %a) #0 {
				; CHECK-LABEL: load_v2i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldrh w8, [x0, #2]
				; CHECK-NEXT: ldrh w9, [x0]
				; CHECK-NEXT: fmov s0, w8
				; CHECK-NEXT: fmov s1, w9
				; CHECK-NEXT: zip1 z0.s, z1.s, z0.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%load = load <2 x i16>, <2 x i16>* %a
				ret <2 x i16> %load
				}

				define <2 x half> @load_v2f16(<2 x half>* %a) #0 {
				; CHECK-LABEL: load_v2f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr s0, [x0]
				; CHECK-NEXT: ret
				%load = load <2 x half>, <2 x half>* %a
				ret <2 x half> %load
				}

				define <4 x i16> @load_v4i16(<4 x i16>* %a) #0 {
				; CHECK-LABEL: load_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: ret
				%load = load <4 x i16>, <4 x i16>* %a
				ret <4 x i16> %load
				}

				define <4 x half> @load_v4f16(<4 x half>* %a) #0 {
				; CHECK-LABEL: load_v4f16:
				kmclaughlinUnsubmitted Done Reply Inline Actions Can you please add a test using a load of type which is illegal for Neon, e.g. `32 x float`? kmclaughlin: Can you please add a test using a load of type which is illegal for Neon, e.g. `32 x float`?
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: ret
				%load = load <4 x half>, <4 x half>* %a
				ret <4 x half> %load
				}

				define <8 x i16> @load_v8i16(<8 x i16>* %a) #0 {
				; CHECK-LABEL: load_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ret
				%load = load <8 x i16>, <8 x i16>* %a
				ret <8 x i16> %load
				}

				define <8 x half> @load_v8f16(<8 x half>* %a) #0 {
				; CHECK-LABEL: load_v8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ret
				%load = load <8 x half>, <8 x half>* %a
				ret <8 x half> %load
				}

				define <16 x i16> @load_v16i16(<16 x i16>* %a) #0 {
				; CHECK-LABEL: load_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ret
				%load = load <16 x i16>, <16 x i16>* %a
				ret <16 x i16> %load
				}

				define <16 x half> @load_v16f16(<16 x half>* %a) #0 {
				; CHECK-LABEL: load_v16f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ret
				%load = load <16 x half>, <16 x half>* %a
				ret <16 x half> %load
				}

				define <2 x i32> @load_v2i32(<2 x i32>* %a) #0 {
				; CHECK-LABEL: load_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: ret
				%load = load <2 x i32>, <2 x i32>* %a
				ret <2 x i32> %load
				}

				define <2 x float> @load_v2f32(<2 x float>* %a) #0 {
				; CHECK-LABEL: load_v2f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: ret
				%load = load <2 x float>, <2 x float>* %a
				ret <2 x float> %load
				}

				define <4 x i32> @load_v4i32(<4 x i32>* %a) #0 {
				; CHECK-LABEL: load_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ret
				%load = load <4 x i32>, <4 x i32>* %a
				ret <4 x i32> %load
				}

				define <4 x float> @load_v4f32(<4 x float>* %a) #0 {
				; CHECK-LABEL: load_v4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ret
				%load = load <4 x float>, <4 x float>* %a
				ret <4 x float> %load
				}

				define <8 x i32> @load_v8i32(<8 x i32>* %a) #0 {
				; CHECK-LABEL: load_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ret
				%load = load <8 x i32>, <8 x i32>* %a
				ret <8 x i32> %load
				}

				define <8 x float> @load_v8f32(<8 x float>* %a) #0 {
				; CHECK-LABEL: load_v8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ret
				%load = load <8 x float>, <8 x float>* %a
				ret <8 x float> %load
				}

				define <1 x i64> @load_v1i64(<1 x i64>* %a) #0 {
				; CHECK-LABEL: load_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: ret
				%load = load <1 x i64>, <1 x i64>* %a
				ret <1 x i64> %load
				}

				define <1 x double> @load_v1f64(<1 x double>* %a) #0 {
				; CHECK-LABEL: load_v1f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: ret
				%load = load <1 x double>, <1 x double>* %a
				ret <1 x double> %load
				}

				define <2 x i64> @load_v2i64(<2 x i64>* %a) #0 {
				; CHECK-LABEL: load_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ret
				%load = load <2 x i64>, <2 x i64>* %a
				ret <2 x i64> %load
				}

				define <2 x double> @load_v2f64(<2 x double>* %a) #0 {
				; CHECK-LABEL: load_v2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldr q0, [x0]
				; CHECK-NEXT: ret
				%load = load <2 x double>, <2 x double>* %a
				ret <2 x double> %load
				}

				define <4 x i64> @load_v4i64(<4 x i64>* %a) #0 {
				; CHECK-LABEL: load_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ret
				%load = load <4 x i64>, <4 x i64>* %a
				ret <4 x i64> %load
				}

				define <4 x double> @load_v4f64(<4 x double>* %a) #0 {
				; CHECK-LABEL: load_v4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ret
				%load = load <4 x double>, <4 x double>* %a
				ret <4 x double> %load
				}


				attributes #0 = { "target-features"="+sve" }

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -force-streaming-compatible-sve < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				define void @store_v4i8(<4 x i8>* %a) #0 {
				; CHECK-LABEL: store_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI0_0
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: ldr d0, [x8, :lo12:.LCPI0_0]
				; CHECK-NEXT: st1b { z0.h }, p0, [x0]
				; CHECK-NEXT: ret
				store <4 x i8> zeroinitializer, <4 x i8>* %a
				ret void
				}

				define void @store_v8i8(<8 x i8>* %a) #0 {
				paulwalker-armUnsubmitted Done Reply Inline Actions You shouldn't need to test all these combinations. It should be sufficient to test without any `-aarch64-sve-vector-bits-min=` options as that's the expected use case. For the tests themselves you want to add some that use 256bit vectors to verify we don't emit neon instructions as part of type legalisation. paulwalker-arm: You shouldn't need to test all these combinations. It should be sufficient to test without any…
				; CHECK-LABEL: store_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI1_0
				; CHECK-NEXT: ldr d0, [x8, :lo12:.LCPI1_0]
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				store <8 x i8> zeroinitializer, <8 x i8>* %a
				ret void
				}
				paulwalker-armUnsubmitted Not Done Reply Inline Actions The patch summary mentions lowering stores and you've added this test but I don't see any code to enable such lowering and hence we are seeing NEON str instructions. paulwalker-arm: The patch summary mentions lowering stores and you've added this test but I don't see any code…

				define void @store_v16i8(<16 x i8>* %a) #0 {
				; CHECK-LABEL: store_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI2_0
				; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI2_0]
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				store <16 x i8> zeroinitializer, <16 x i8>* %a
				ret void
				}

				define void @store_v32i8(<32 x i8>* %a) #0 {
				; CHECK-LABEL: store_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI3_0
				; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI3_0]
				; CHECK-NEXT: stp q0, q0, [x0]
				; CHECK-NEXT: ret
				store <32 x i8> zeroinitializer, <32 x i8>* %a
				ret void
				}

				define void @store_v2i16(<2 x i16>* %a) #0 {
				; CHECK-LABEL: store_v2i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI4_0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: ldr d0, [x8, :lo12:.LCPI4_0]
				; CHECK-NEXT: st1h { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				store <2 x i16> zeroinitializer, <2 x i16>* %a
				ret void
				}

				define void @store_v2f16(<2 x half>* %a) #0 {
				; CHECK-LABEL: store_v2f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI5_0
				; CHECK-NEXT: ldr d0, [x8, :lo12:.LCPI5_0]
				; CHECK-NEXT: str s0, [x0]
				; CHECK-NEXT: ret
				store <2 x half> zeroinitializer, <2 x half>* %a
				ret void
				}

				define void @store_v4i16(<4 x i16>* %a) #0 {
				; CHECK-LABEL: store_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI6_0
				; CHECK-NEXT: ldr d0, [x8, :lo12:.LCPI6_0]
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				store <4 x i16> zeroinitializer, <4 x i16>* %a
				ret void
				}

				define void @store_v4f16(<4 x half>* %a) #0 {
				; CHECK-LABEL: store_v4f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI7_0
				; CHECK-NEXT: ldr d0, [x8, :lo12:.LCPI7_0]
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				store <4 x half> zeroinitializer, <4 x half>* %a
				ret void
				}

				define void @store_v8i16(<8 x i16>* %a) #0 {
				; CHECK-LABEL: store_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI8_0
				; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI8_0]
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				store <8 x i16> zeroinitializer, <8 x i16>* %a
				ret void
				}

				define void @store_v8f16(<8 x half>* %a) #0 {
				; CHECK-LABEL: store_v8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI9_0
				; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI9_0]
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				store <8 x half> zeroinitializer, <8 x half>* %a
				ret void
				}

				define void @store_v16i16(<16 x i16>* %a) #0 {
				; CHECK-LABEL: store_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI10_0
				; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI10_0]
				; CHECK-NEXT: stp q0, q0, [x0]
				; CHECK-NEXT: ret
				store <16 x i16> zeroinitializer, <16 x i16>* %a
				ret void
				}

				define void @store_v16f16(<16 x half>* %a) #0 {
				; CHECK-LABEL: store_v16f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI11_0
				; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI11_0]
				; CHECK-NEXT: stp q0, q0, [x0]
				; CHECK-NEXT: ret
				store <16 x half> zeroinitializer, <16 x half>* %a
				ret void
				}

				define void @store_v2i32(<2 x i32>* %a) #0 {
				; CHECK-LABEL: store_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str xzr, [x0]
				; CHECK-NEXT: ret
				store <2 x i32> zeroinitializer, <2 x i32>* %a
				ret void
				}

				define void @store_v2f32(<2 x float>* %a) #0 {
				; CHECK-LABEL: store_v2f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str xzr, [x0]
				; CHECK-NEXT: ret
				store <2 x float> zeroinitializer, <2 x float>* %a
				ret void
				}

				define void @store_v4i32(<4 x i32>* %a) #0 {
				; CHECK-LABEL: store_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: stp xzr, xzr, [x0]
				; CHECK-NEXT: ret
				store <4 x i32> zeroinitializer, <4 x i32>* %a
				ret void
				}

				define void @store_v4f32(<4 x float>* %a) #0 {
				; CHECK-LABEL: store_v4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: stp xzr, xzr, [x0]
				; CHECK-NEXT: ret
				store <4 x float> zeroinitializer, <4 x float>* %a
				ret void
				}

				define void @store_v8i32(<8 x i32>* %a) #0 {
				; CHECK-LABEL: store_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI16_0
				; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI16_0]
				; CHECK-NEXT: stp q0, q0, [x0]
				; CHECK-NEXT: ret
				store <8 x i32> zeroinitializer, <8 x i32>* %a
				ret void
				}

				define void @store_v8f32(<8 x float>* %a) #0 {
				; CHECK-LABEL: store_v8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI17_0
				; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI17_0]
				; CHECK-NEXT: stp q0, q0, [x0]
				; CHECK-NEXT: ret
				store <8 x float> zeroinitializer, <8 x float>* %a
				ret void
				}

				define void @store_v1i64(<1 x i64>* %a) #0 {
				; CHECK-LABEL: store_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fmov d0, xzr
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				store <1 x i64> zeroinitializer, <1 x i64>* %a
				ret void
				}

				define void @store_v1f64(<1 x double>* %a) #0 {
				; CHECK-LABEL: store_v1f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movi d0, #0000000000000000
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				store <1 x double> zeroinitializer, <1 x double>* %a
				ret void
				}

				define void @store_v2i64(<2 x i64>* %a) #0 {
				; CHECK-LABEL: store_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: stp xzr, xzr, [x0]
				; CHECK-NEXT: ret
				store <2 x i64> zeroinitializer, <2 x i64>* %a
				ret void
				}

				define void @store_v2f64(<2 x double>* %a) #0 {
				; CHECK-LABEL: store_v2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: stp xzr, xzr, [x0]
				; CHECK-NEXT: ret
				store <2 x double> zeroinitializer, <2 x double>* %a
				ret void
				}

				define void @store_v4i64(<4 x i64>* %a) #0 {
				; CHECK-LABEL: store_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI22_0
				; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI22_0]
				; CHECK-NEXT: stp q0, q0, [x0]
				; CHECK-NEXT: ret
				store <4 x i64> zeroinitializer, <4 x i64>* %a
				ret void
				}

				define void @store_v4f64(<4 x double>* %a) #0 {
				; CHECK-LABEL: store_v4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI23_0
				; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI23_0]
				; CHECK-NEXT: stp q0, q0, [x0]
				; CHECK-NEXT: ret
				store <4 x double> zeroinitializer, <4 x double>* %a
				ret void
				}

				attributes #0 = { "target-features"="+sve" }

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64]: Force generating code compatible to streaming modeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 467838

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/lib/Target/AArch64/AArch64Subtarget.h

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-loads.ll

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll

[AArch64]: Force generating code compatible to streaming mode
ClosedPublic