Adds the llvm.aarch64.sve.ldnf1 intrinsic, adding a new
flag to MachineMemOperand (MONonFaulting)
Details
Diff Detail
Event Timeline
I'm not sure it's legal to transform a non-faulting load to a load with a non-faulting flag? At least, we'd need to consider the implications of that very carefully. In particular, I'm concerned about the interaction with intrinsics that read/write FFR. I mean, you could specify that loads marked MONonFaulting actually write to the FFR register, but that seems confusing.
It seems simpler to preserve the intrinsic until isel, at least for now.
| llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
|---|---|---|
| 4490 | I'm not sure how this is related. | |
| llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
|---|---|---|
| 329 | This duplicates a lot of code, maybe it makes sense to combine this into a multiclass. I'm thinking of something along the lines of: multiclass load_store_fragments<code pred> {
def : PatFrag<(ops node:$ptr, node:$pred, node:$def),
(masked_ld node:$ptr, undef, node:$pred, node:$def),
pred>;
def _i8 : PatFrag<(ops ...),
((cast<SDPatternOperator>(NAME) ...),
[{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i8; }]>;
def _i16 : PatFrag<(ops ...),
((cast<SDPatternOperator>(NAME) ....),
[{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i16; }]>;
def _i32 : PatFrag<(ops ...),
((cast<SDPatternOperator>(NAME) ....),
[{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i32; }]>;
def _i64 : PatFrag<(ops ...),
((cast<SDPatternOperator>(NAME) ....),
[{ return cast<MaskedLoadSDNode>(N)->getMemoryVT().getScalarType() == MVT::i64; }]>;
}
defm non_temporal_load : load_store_fragments<[{
return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::NON_EXTLOAD &&
cast<MaskedLoadSDNode>(N)->isUnindexed() &&
cast<MaskedLoadSDNode>(N)->isNonTemporal() &&
!cast<MaskedLoadSDNode>(N)->isNonFaulting();
}]>;
defm sext_non_temporal_load : load_store_fragments<[{
return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::SEXTLOAD &&
cast<MaskedLoadSDNode>(N)->isUnindexed() &&
cast<MaskedLoadSDNode>(N)->isNonTemporal() &&
!cast<MaskedLoadSDNode>(N)->isNonFaulting();
}]>;
defm zext_non_faulting_load : load_store_fragments<[{
return cast<MaskedLoadSDNode>(N)->getExtensionType() == ISD::ZEXTLOAD &&
cast<MaskedLoadSDNode>(N)->isUnindexed() &&
!cast<MaskedLoadSDNode>(N)->isNonTemporal() &&
cast<MaskedLoadSDNode>(N)->isNonFaulting();
}]>; | |
| 338 | nit: we should probably add _masked_ to the name (although I realise that I forgot to spot that on the non-temporal patch) | |
| 341 | Shall we support any-extend as well? (thus making asext_non_faulting_load) | |
I missed this comment earlier, but that's a valid point. For SVE having side-effects is assumed from the non-faulting flag. We hoped to latch on to the MLOAD here to reuse code and benefit from legalization in case we want to add a more generic mechanism in the future to use such loads directly in the loop-vectorizer.
Perhaps we can clarify the intent that the non-faulting mode may have side-effects by renaming the flag to something like NonFaultingWithSideEffects? Otherwise we can stick with the intrinsics as you suggest.
Perhaps we can clarify the intent that the non-faulting mode may have side-effects by renaming the flag to something like NonFaultingWithSideEffects?
We could specify something like that... but that probably requires some changes to target-independent DAGCombine, and it's sort of a weird edge case given that no other loads have that kind of side-effect. If we are going to make a change like that to MachineMemOperand, we'd probably want to propose it on llvmdev. I'd prefer a little extra code here to avoid that rabbit hole. We're busy enough already with other substantial changes to target-independent code.
Thanks for the feedback on this patch, @efriedma & @sdesmalen!
I think there is still value in adding a NonFaulting flag to MachineMemOperand so that we can benefit from legalisation, but as this is not a requirement for the ACLE I have created a new patch which implements the non-faulting load intrinsic explicitly: https://reviews.llvm.org/D71698
I will leave this patch in the 'plan changes' state so that it can be referred to in future discussions on the mailing list.
I'm not sure how this is related.