When a function is compiled to be in Streaming(-compatible) mode, the full
set of SVE instructions may not be available. This patch adds an interface
to query that and changes the codegen for FADDA (not legal in Streaming-SVE
mode) to instead be expanded for fixed-length vectors, or otherwise not to
code-generate for scalable vectors.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1462 | Here you are also changing the logic. | |
19563 | This does not look right. | |
llvm/lib/Target/AArch64/AArch64Subtarget.h | ||
219 | s/becuase/because/ |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1462 | By default VECREDUCE_SEQ_FADD is marked as 'Expand', which means for fixed-length vectors that it will expand the operation to a sequence of scalar options to do the vector reduction. For scalable vectors, no such scalarisation exists (because it would require SelectionDAG to generate a loop to do this), so the compiler will fail to compile. That is the expected behaviour, because if the target can't use these instruction due to the selected streaming-mode, then the intrinsic/operation should not have been formed in the LLVM IR in the first place. | |
19563 | Answered in my other comment. |
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
495 | Typo, comaptible -> compatible. It might be nice to include the "-" on the front as well. | |
497–498 | Using just if (ForceStreamingCompatibleSVE) is passing check-llvm for me, so either this can be simplified or a test is missing. If a test is missing, this check is copy/pasted from isNeonAvailable above. We can separate this into a new function such as bool AArch64Subtarget::isForceStreamingCompatibleSVE() const{ return ForceStreamingCompatibleSVE.getNumOccurrences() > 0; } Then we can simplify things to bool AArch64Subtarget::isFullSVEAvailable() const{ return hasSVEorSME() && !StreamingSVEMode && !StreamingCompatibleSVEMode && !isForceStreamingCompatibleSVE(); } And provide a nice check from the subtarget. If ForceStreamingCompatibleSVE.getNumOccurrences() > 0; and ForceStreamingCompatibleSVE are equivalent then you can just do return hasSVEorSME() && !StreamingSVEMode && !StreamingCompatibleSVEMode && !ForceStreamingCompatibleSVEMode; | |
llvm/lib/Target/AArch64/AArch64Subtarget.h | ||
220–221 | Maybe its best just to mention the feature instead of explaining it here since there will likely be a more accurate description later that diverges from this text? |
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
497–498 | You're right, in this case they are equivalent. |
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
491 | Perhaps it's worth being consistent with isNeonAvailableand using the isStreaming and isStreamingCompatible interfaces? | |
llvm/lib/Target/AArch64/AArch64Subtarget.h | ||
228 | Do we need to also check the same get-out clause that we do in AArch64TargetTransformInfo.cpp, i.e. EnableScalableAutovecInStreamingMode? |
Addressed @david-arm's comments and put the check for ForceStreamingCompatibleSVE into isStreamingCompatible(), as that makes the logic a bit simpler.
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
491 | Thanks, good point. | |
llvm/lib/Target/AArch64/AArch64Subtarget.h | ||
228 | Possibly, but that requires changing the interface for getMinVectorRegisterBitWidth to also take the RegisterKind. |
llvm/lib/Target/AArch64/AArch64Subtarget.cpp | ||
---|---|---|
491 | Is the presence of SME relevant to this question? Just hasSVE() seems more fitting. | |
llvm/lib/Target/AArch64/AArch64Subtarget.h | ||
221 | I don't like this name. FullSVE isn't really a thing, you either have SVE or you don't, much like with NEON. Can this just be isSVEAvailable()? | |
llvm/test/CodeGen/AArch64/sve-fp-reduce-fadda.ll | ||
4–6 | I'm not sure it worth having RUN lines that we know will crash? If we care about ensuring such IR isn't code generated then we should update the IR Verifier to generate a clean failure rather than require a compiler crash. |
llvm/lib/Target/AArch64/AArch64Subtarget.h | ||
---|---|---|
221 | I'm happy to change it to isSVEAvailable(), but it's perhaps a bit confusing in the context of other interfaces such as useSVEforFixedLengthVectors, where SVE has a meaning where it allows for the streaming-compatible subset of SVE, whereas for isSVEAvailable it strictly relates to the full set of SVE instructions. | |
llvm/test/CodeGen/AArch64/sve-fp-reduce-fadda.ll | ||
4–6 | I'd rather keep the RUN line to make sure the code I added in the patch is protected by a test and therefore cannot be removed without breaking something. |
Renamed isFullSVEAvailable -> isSVEAvailable
Changed hasSVEorSME() -> hasSVE() in isSVEAvailable
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19564–19566 | As discussed this shouldn't be necessary because we require users of the target specific SVE intrinsics to know what they're doing. There are several cases where incorrect usage of these intrinsics can lead to unexpected behaviour and we make no attempt to guard against them. | |
llvm/lib/Target/AArch64/AArch64Subtarget.h | ||
221 | For clarity I interpret useSVEforFixedLengthVectors as just "use SVE instructions for fixed length vectors". It makes no judgement whether such instructions are streaming-compatible, it's just that most are and the few that are not use different code paths based on a more specific query like isSVEAvailable or isStreamingCompatible after this top level decision. That's not to say the useSVEforFixedLengthVectors interface is perfect when considering this newer requirement but for now it seems to be working ok. | |
llvm/test/CodeGen/AArch64/sve-fp-reduce-fadda.ll | ||
4–6 | Fair enough. | |
llvm/test/CodeGen/AArch64/sve-intrinsics-fp-reduce-fadda.ll | ||
4–5 | As discuss whilst protecting the common intrinsic has value the same is not true for the target specific SVE intrinsics. |
Here you are also changing the logic.
Before for SVE FADD was custom, now it can be legal.
So it means we can lower to an assembly, but I don't see that anywhere.
I believe it will crash. Is that the expected behaviour?
I have the same question for line 1508.
I believe it will crash. Is that the expected behaviour?