Force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Hi @hassnaa-arm, could you rename the title to something that describes the patch a little more? I think something like
[AArch64][SVE]: Force the use of SVE to lower fixed-width arithmetic ops in streaming mode
would be a bit clearer. What do you think?
Hi @hassnaa-arm, it looks like this patch is based off D133433. Can you add that as a parent revision so it's obvious to the reviewer please? You can do this by clicking on "Edit Related Revisions" -> "Edit Parent Revisions" at the top-right corner of the page. Thanks!
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
4466 | Hi @hassnaa-arm, this change doesn't look right. I would expect it to break some tests? When we're not in streaming mode we also want to override NEON for 64-bit element types. Can you put the OverrideNEON flag back in, perhaps something like // If SVE is available then i64 vector multiplications can also be made legal. bool OverrideNEON = VT == MVT::v2i64 || VT == MVT::v1i64 || Subtarget->forceSVEInStreamingMode(); | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-arith.ll | ||
92 | Again, this is illegal in streaming mode. | |
1588 | This looks like a NEON instruction - can you investigate where this is coming from? |
Thanks for this @hassnaa-arm! I had some comments about how to tidy up the tests a bit. I also think some there are some load/store test changes that shouldn't be part of this patch.
llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll | ||
---|---|---|
420 ↗ | (On Diff #465695) | nit: whitespace |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ext-loads.ll | ||
8 ↗ | (On Diff #465695) | I don't think these changes should be part of this patch, since it's not changing loads and stores? |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-arith.ll | ||
13 | Could you also add a test for an illegal NEON type too, i.e. <4 x i8> or <2 x i16>? | |
92 | Please ignore this comment! stp q0, q1 is legal - my mistake! | |
473 | Again, could you add at least one illegal type - <4 x i8> or <2 x i16>? | |
1410 | Can you add an illegal NEON type such as <2 x i16>? | |
1588 | Please ignore this - my mistake! | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-div.ll | ||
299 | This still has the vscale_range(16,0) attribute. Can you remove it and recreate the CHECK lines please? | |
1063 | Again, this still has the vscale_range(16,0) attribute. Can you remove it and regenerate the CHECK lines? | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll | ||
17 | Can you add a test for an illegal type such as <4 x i8> too? | |
218 | Wow, this code surely gets an award for being so impressively bad?! | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-rem.ll | ||
472 | I think that you can remove the tests greater than 512 bits, i.e. <128 x i8>. If the tests already work for <64 x i8> they are likely to work for anything larger too. | |
774 | Again, maybe remove this test since I'm not sure what extra value it gives us? | |
1608 | Again, maybe remove this test? | |
1775 | Again, maybe remove this test? | |
2240 | Again, maybe remove this test? | |
2335 | Again, maybe remove this test? | |
2652 | Again, maybe remove this test? | |
2747 | Again, maybe remove this test? |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-rem.ll | ||
---|---|---|
3384 | Again, maybe remove this test? | |
3686 | Again, maybe remove this test? | |
4520 | Again, maybe remove this test? | |
4687 | Again, maybe remove this test? | |
5152 | Again, maybe remove this test? | |
5247 | Again, maybe remove this test? | |
5564 | Again, maybe remove this test? | |
5659 | Again, maybe remove this test? | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-loads.ll | ||
2 ↗ | (On Diff #465695) | Not part of this patch? |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-store.ll | ||
2 ↗ | (On Diff #465695) | Not part of this patch? |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ext-loads.ll | ||
---|---|---|
8 ↗ | (On Diff #465695) | I'm sorry, it's by mistake. |
This looks a lot better now thanks @hassnaa-arm and the test files are much smaller too. I spotted two issues in a couple of tests where we are using illegal NEON instructions, e.g. bic. Would you be able to investigate these please?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-div.ll | ||
---|---|---|
977 | This is a NEON vector instruction - this is definitely illegal in streaming mode. Can you try to find out why this is being inserted please? | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll | ||
1212 | Again, these bic instructions are illegal in streaming mode. |
Fix invalid bic instruction in streaming mode.
Fix invalid bic instruction that was generated during 'and' combining by converting the fixed-length vector to scalable one to combine SVEAnd instead of and.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
15843 | nit: In LLVM the style is to start local variables with an upper-case, i.e. ScalableLHS. | |
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-arith.ll | ||
66–93 | I think this test can be removed, because you've already covered the "twice as wide" case (32 x i8) which ensures we don't emit any other instructions not valid in streaming mode. The "four times as wide' should already be covered by sve-fixed-length-int-arith.ll. | |
151 | This test can be removed for the same reason as mentioned above. | |
224 | This test can be removed for the same reason as mentioned above. | |
297 | This test can be removed for the same reason as mentioned above. (same for all other 4 x as wide instances in the remainder of this file and other files in this patch) |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
15840–15846 | This looks odd. We shouldn't really be doing lower within DAGCombine. What happens if you just exit the combine for the invalid case? That said, I can see functions like tryAdvSIMDModImm32() are used in other part of codegen so I'm wondering if the prevention logic is best place within such functions so all use cases are covered. | |
22471 | When we hit a similar issue with LowerToPredicatedOp() we decide to drop the calls to useSVEForFixedLengthVectorVT() in favour or just using VT.isFixedLengthVector() && isTypeLegal(VT). Would the same work in your case? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
22471 | Sorry, I don't understand. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
22471 | The former, so you can drop the call to useSVEForFixedLengthVectorVT() and instead have assert(VT.isFixedLengthVector() && isTypeLegal(VT) && .... By this point we should be working with only legal types and there's no harm in handling any of them. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
22471 | why do you suggest that instead of using useSVEForFixedLengthVectorVT() ? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
22471 | useSVEForFixedLengthVectorVT() is a semi-complex function that exists to choose which path to take during code generation and is thusly used to determine how to lower an ISD::ADD for example. However, by calling LowerToPredicatedOp() you've already made that decision and so you only need to detect scenarios that would result in broken code generation. For the case of LowerToPredicatedOp() this just means ensuring the input is a legal fixed length vector. |
Remove step of converting to scalable vector that was added within DAGCombine (performAndCombine)
Hi @hassnaa-arm, I think this patch is very close to being ready! However, do you know why the test file sve-streaming-fixed-length-int-shifts.ll was deleted?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1396 | I remember in one of your previous patches that @sdesmalen mentioned you shouldn't need to add v1i64 as it should be treated as a scalar. What happens if you remove it? I imagine your v1i64 tests might just generate scalar code? |
It was a fault while rebasing the parent patch to this patch.
In the parent patch, that deleted file was replaced by sve-streaming-mode-fixed-length-int-shifts.ll
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1396 | Yes, at that patch, there were no tests needing custom-lowering for v1i64. But in this patch, the test file of sve-streaming-mode-fixed-length-int-log.ll define <1 x i64> @and_v1i64(<1 x i64> %op1, <1 x i64> %op2) define <1 x i64> @xor_v1i64(<1 x i64> %op1, <1 x i64> %op2) |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1396 | I'll also add that in general you don't want to revert such integer types to scalar because that'll cause GPR-VPR transfers that can be expensive, perhaps even more so when it comes to streaming mode. You can also see that within LowerMUL we have special handling for v1i64 to keep it in the vector unit. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1396 | Does that mean we need coverage as well for v1i32? (and perhaps also v1i8, v1i16). If so, I wonder if that might warrant a separate patch rather than support the odd case here? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1396 | Discussed offline but for completeness the answer is no. The other MVTs you list are not type legal (only 64/128-bit vectors are legal for NEON) and so they'll not make it into operation legalisation. |
I remember in one of your previous patches that @sdesmalen mentioned you shouldn't need to add v1i64 as it should be treated as a scalar. What happens if you remove it? I imagine your v1i64 tests might just generate scalar code?